EFFECT  OF  A GUIDANCE  UNIT  ON  TEST-TAKING  STRATEGIES 
ON  READING  TEST  SCORES  OF  SIXTH  GRADE  STUDENTS 


DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN 
PARTIAL  FULFILLMENT  OF  THE  REQUIREMENTS  FOR 
THE  DEGREE  OF  DOCTOR  OF  PHILOSOPHY 


1984 


ACKNOWLEDGMENTS 


I would  like  to  recognize  those  people  who  have  been 
most  influential  in  helping  me  attain  this  highest  goal  in 
my  educational  career.  My  deepest  thanks  go  to  my  mother, 
Eula  B.  Guess,  who  taught  me  to  read  at  an  early  age  and 
thereby  instilled  a love  of  learning  and  set  the  stage  for 
a lifetime  of  academic  pursuits.  Her  constant  encourage- 
ment has  been  a strengthening  force. 

I am  indebted  to  the  members  of  my  doctoral  committee, 
each  of  whom  added  a special  touch  to  this  endeavor: 

Dr.  R.  D.  Myrick,  who  has  always  been  available  when  I 
needed  him,  and  who  has  been  a steady,  guiding  force; 

Dr.  Harold  Riker,  who  also  has  lent  encouragement  and 
support;  Dr.  James  Algina,  who  has  given  timely  advice  and 
attention  to  proper  statistical  procedures;  and  Dr.  Travis 
Carter,  who  has  spent  much  time  in  supervision  of  practica 
and  who  has  many  times  lent  a helping  hand.  My  thanks  go 
to  these  individuals  whom  I consider  to  be  my  friends. 

Certain  teachers  and  professors  had  a special  part  in 
helping  me  reach  this  point  in  my  professional  development. 
Virgie  Livingston,  my  first  teacher,  helped  me  to  believe 
that  I could  do  anything.  Evelyn  Bozarth,  my  high  school 


English  teacher,  taught  me  how  to  write  and  instilled  a 
love  and  fascination  for  the  beauty  of  language. 

Dr.  Dorothy  Hoffman,  eminent  professor  of  modern  languages 
at  Florida  State  University,  furthered  my  love  for  language, 
literature,  and  people.  In  guiding  me  to  a keen  apprecia- 
tion and  understanding  of  the  literary  personages  I came  to 
know,  she  taught  me  my  first  counseling  skills.  Dr.  Harry 
Padgett  of  Appalachian  State  University  was  largely  respon- 
sible for  my  becoming  so  committed  to  the  counseling  pro- 
fession that  I couldn't  give  it  up. 

Finally,  but  equally  important,  I would  like  to  ex- 
press my  gratitude  to  my  close  friend.  Ruby  Leite,  who 
spent  hours  proofreading  and  who  helped  in  many  special 
ways.  She,  too,  has  been  a source  of  inspiration. 


Tests  in  the  Schools 


Pitfalls  of  standardized 


i 

8 

: 


SKU“ : 


22332232!  SSSS2SS5S2SS  222SSS  SSSSSSSSSSS 


Analyses  

IV.  RESULTS  OF  THE  STUDY  

Summary  

V.  SUMMARY,  CONCLUSIONS,  LIMITATIONS, 
RECOMMENDATIONS  AND  IMPLICATIONS  . 

Summary  ...  


Implications  

APPENDIX  A.  GUIDANCE  UNIT  

APPENDIX  B.  CHILDREN'S  TEST  ANXIETY  SCALE 

REFERENCES  

BIOGRAPHICAL  SKETCH 


100 

101 

101 

104 

10S 

106 

108 

110 

111 

112 

113 

125 

128 

139 


in  Partial  Fi 


aiiss 


in  reading  and  math  in  preparation  £or  the  spring  adminis- 
tration of  the  Stanford  Achievement  Test.  After  the  unit 
was  completed,  experimental  and  control  groups  were  post- 
tested,  using  the  reading  comprehension  subtest  of  the  CTBS, 
Form  V,  and  the  CTAS. 

Posttest  data  from  the  two  measures  were  collected  for 
300  students  (E-145 ; C=155) . Analysis  of  covariance 
(ANCOVA)  appropriate  for  a hierarchical  design  was  used  to 
analyze  the  results.  Pretest  scores  were  used  as  covariates. 

According  to  the  ANCOVA,  there  was  no  significant 
interaction  between  pretest  levels  of  achievement  and 
anxiety  and  treatment  effect  on  posttest  reading  scores. 
Neither  was  there  a significant  effect  of  the  treatment  on 
reading  test  scores  or  on  test  anxiety,  according  to  this 
analysis.  However,  by  using  a pooling  approach,  the  effect 
of  the  treatment  on  test  anxiety  was  significant. 

It  was  concluded  that  although  no  significant  effects 
of  the  treatment  could  be  reported,  training  in  test-taking 
skills  is  nevertheless  important.  Younger  students  or 
students  who  were  more  test-naive  might  have  responded 
differently. 

Those  students  who  reported  a high  level  of  test 
anxiety  on  the  pretest  tended  to  score  low  on  the  reading 
posttest.  These  students  possibly  need  more  intensive  work 
in  small  groups  with  more  opportunity  for  practice  in  order 
to  benefit  from  this  type  of  training. 


effec- 


taking  strategies  for  standardized  tests,  on  elementary 


participated  in  a six-session  guidance  unit  on  test-taking 


skills,  and  the  effects  of  the  unit  were  investigated 
through  a research  design. 


The  following  research  questions  were  considered: 


1.  Did  a sixth  grade  guidance  unit  on  test-taking 
achievement  test? 


2.  Were  reading  test  scores  of  high  achieving  sixth 
grade  students  affected? 

3.  Were  reading  test  scores  of  low  achieving  sixth 
grade  students  affected? 


4.  Was  the  guidance  unit  on  test-taking  skills  more 
beneficial  to  high  achieving  or  to  low  achieving  students? 


5.  Was  the  guidance  unit  on  test-taking  skills  more 
beneficial  to  high  test  anxious  students  or  low  test 


Test  results  are  used  to  make  decisions  such  as  the 
makeup  of  instructional  groupings,  placement  in  educational 
programs,  promotion,  and  even  graduation  (Shuller,  1979). 
Because  of  the  importance  of  test  results  in  children's 
lives,  it  is  incumbent  upon  teachers  and  counselors  to  help 
ensure  that  test  data  are  true  indicators  of  children's 
abilities  (Brown,  1982). 

A number  of  educators  believe  that  there  may  be  sources 
of  variance  in  educational  test  scores  besides  item  content 
and  random  error.  Test-wise ness  has  been  suggested  as  one 
such  source  (Dreisbach  & Keogh,  1982;  Ebel,  1965;  Erickson, 
1972;  Fueyo,  1977;  Gifford  & Fluitt,  1978b;  Hillman,  Bishop 
& Ebel,  1965;  Moore,  Schutz  s Baker,  1966;  Oakland,  1972; 
Wahlstrom  & Boersma,  1968) . 

Possibly  great  numbers  of  students  show  depressed 
scores  on  standardized  tests  because  of  deficiencies  in 
test-taking  ability.  Ebel  (1965)  pointed  out  that  more 
error  in  measurement  is  likely  to  occur  when  students  have 
too  little,  rather  than  too  much,  skill  in  taking  tests. 

If  students  are  unable  to  mark  answer  sheets  accurately  or 
to  answer  questions  efficiently,  the  scores  they  receive 
are  not  valid  indicators  either  of  ability  or  of  achieve- 
ment (Brown,  1982). 

The  professional  literature  provides  a basis  for  ana- 
lyzing test-wiseness  and  suggests  a framework  for  research 
(e.g.,  Downey,  1977;  Millman,  Bishop  & Ebel,  1965). 


number  of  studies 


reported  (e.g.,  Dreisbach 


Keogh,  1982;  Jongsma  & Warshauer,  1975;  Kalechstein, 
Kalechstein  s Docter,  1981;  Oakland,  1972) , using  a variety 
of  techniques.  In  most  instances  gains  have  been  reported 
for  the  instructional  groups,  although  they  have  not  always 
been  statistically  significant  (Jongsma  & Warshauer,  1975) . 
Yet,  more  studies  are  needed,  especially  at  the  elementary 
school  level,  to  determine  whether  or  not  it  is  practical 
to  include  test-wiseness  instruction  as  part  of  the  edu- 
cational program  in  an  elementary  school  and,  subsequently, 
as  a part  of  undergraduate  teacher  training  programs. 

The  role  of  anxiety  as  a source  of  variance  in  test 
scores  is  another  concern  (Petty  & Harrell,  1977;  Rudman, 
1976).  Approximately  20*  of  school  children  and  25*  of 
college  students  are  test  anxious  (Barrios  s Shigetomi, 

1979) . However,  the  nature  of  the  relationship  between 
test-wiseness  and  anxiety  has  not  been  examined  (Woodley, 
1975) . None  of  the  studies  on  teaching  children  test-taking 
skills  looked  at  the  possible  effect  of  test-wiseness 
instruction  upon  the  anxiety  level  of  students  or  at  the 
possibility  that  the  level  of  test  anxiety  experienced  by 
the  children  might  influence  the  effectiveness  of  the 
instruction,  will  a child  with  high  test  anxiety  profit 
more  or  less  from  a guidance  unit  on  test-taking  skills 
than  the  child  with  low  test  anxiety?  This  is  an  interest- 
ing question  worthy  of  consideration. 


Also,  in  previous  studies,  as  reported  in  the  pro- 
fessional literature,  the  main  emphasis  has  been  on  test- 
taking skills  and  practice  alone.  There  is  a need  for  more 
comprehensive  programs,  programs  which  will  address  test- 
taking attitudes  and  behaviors  as  well  as  specific  strate- 

Definitions  of  Terms 

The  following  terms  were  used  in  this  study  according 
to  the  following  definitions. 

Test-wiseness:  A person's  ability  to  utilize  the 

characteristics  and  formats  of  the  test  and/or  the  test- 
taking situation  to  receive  a high  score. 

Test  Anxiety:  The  tendency  of  students  to  respond  to 

nervousness  induced  by  testing  situations  with  worried, 
negative,  self-centered  thoughts  and  statements. 

Organization  of  the  Study 

The  rest  of  the  study  will  be  organized  as  follows. 
Chapter  II  will  present  a review  of  the  related  literature. 
In  Chapter  III,  the  methodology  of  the  study  will  be  de- 
scribed. The  results  of  the  study  will  appear  in  Chapter 
IV.  A summary,  limitations,  and  possible  applications  of 
the  study  will  be  presented  in  Chapter  V. 


CHAPTER  II 

REVIEW  OF  THE  RELATED  LITERATURE 

The  following  chapter  presents  a review  of  the  profes- 
sional literature  relating  to  the  use  of  tests  in  educa- 
tional programs.  The  beginning  section  deals  with  the 
history,  kinds,  and  uses  of  teacher-made  and  standardized 
tests  in  the  schools.  In  the  second  section  various  effec- 
tors of  students 1 performance  on  standardized  tests  are 
discussed.  Finally,  the  third  and  fourth  sections  address 
more  specifically  the  effects  of  levels  of  test-wiseness 
and  anxiety  upon  students'  test  scores. 

Tests  in  the  Schools 

An  important  part  of  an  educator's  job  is  to  evaluate 
pupil  progress.  In  order  to  do  this  effectively,  the 
teacher  calls  upon  a variety  of  methods  of  appraisal,  which 
include  observing  the  students  in  the  classroom;  sizing  up 
the  students  in  conference,  interview,  and  informal  dis- 
cussion; and  appraising  the  quality  of  assignments  done 
outside  of  school  (Thorndike  & Hagen,  1969)  . Teachers  must 
also  use  tests.  Frequently  the  testing  situation  becomes 
the  prime  basis  for  evaluation  for  both  the  teacher  and  the 
child.  A brief  discussion  of  the  two  types  of  pencil  and 


5 — teacher-made  classroom 


and  standardized 


focus  of  this  section  of  the  review. 


Classroom  Tests 

Teacher-constructed  classroom  tests  are  used  often  to 
monitor  pupil  and  class  progress,  to  identify  the  need  for 
remediation,  to  motivate  students,  and  so  on.  They  tend  to 
be  specific  in  content  and  to  focus  upon  a particular  unit 
or  course.  A brief  history  of  classroom  testing,  kinds  of 
tests,  and  their  uses  in  the  classroom  will  be  discussed. 

History.  It  is  generally  assumed  that  teachers  have 
always  measured  or  evaluated  the  progress  of  their  students 
(Noll  & Scannell,  1972).  In  fact,  some  form  of  measurement 
is  inherent  in  the  educational  process  (Stanley  & Hopkins, 
1972) . The  classroom  teacher  must  constantly  evaluate  in 
order  to  determine  the  academic  and  social  growth  of  pupils 
and  to  have  a basis  upon  which  to  base  numerous  decisions 
concerning  their  educational  programs. 

Information  from  early  records  indicates  that  early 
evaluation  of  pupils  was  done  through  personal  observation, 
oral  questioning,  and  subjective  judgment  by  the  teacher 
(Noll  & Scannell,  1972).  In  fact,  until  the  advent  of  inex- 
pensive pencils  and  paper  after  the  middle  of  the  nineteenth 
century,  oral  examinations  were  standard  in  our  country 
(Stanley  s Hopkins,  1972) . 

In  addition  to  teacher  evaluation,  it  has  been  tra- 
ditional for  citizens  other  than  teachers  to  share  some  of 


the  responsibility  for  evaluating  the  progress  of  students 
in  the  schools.  In  the  early  history  of  our  country  it  was 
customary  to  have  a school  committee  of  lay  citizens  to  be 
responsible  for  local  schools  in  each  community.  These 
committees  would  visit  the  schools  in  their  districts  once 
a year  and  would  examine  the  pupils  by  asking  them  ques- 
tions. These  committees  were  forerunners  of  today's  school 
boards  (Noll  & Scannell,  1972).  Thus,  the  importance  of 
measurement  within  the  educational  process  has  long  been 
recognized.  In  recent  years,  a mild  revolution  has  occurred 
concerning  the  importance  of  assessing  individual  pupil 
progress  in  order  to  more  effectively  plan  teaching  methods 
and  procedures  (Scannell s Tracy,  1975). 

Kinds  of  classroom  tests.  Most  teacher-made  tests 
come  under  the  heading  of  criterion-referenced  tests,  in 
that  they  reflect  the  goals  and  objectives  of  instruction 
and  require  some  predetermined  level  of  mastery.  Types  of 
tests  include  those  in  which  examinees  are  required  to 
compose  the  answer,  or  free-response  tests,  and  those  that 
require  the  examinee  to  choose  an  answer  from  options,  or 
structured-response  tests.  The  former  include  short 
answer,  completion,  and  essay  tests,  while  the  latter  in- 
clude true-false  and  matching  exercises  and  multiple-choice 
tests.  Each  of  these  types  will  be  discussed  briefly. 

Completion  tests  consist  of  sentences  with  a word  or 
phrase  omitted.  Short-answer  tests 


usually  differ  from 


10 


completion  tests  in  that  a complete  question  is  presented 
rather  than  a sentence  stem  with  some  words  omitted  but 
require  the  same  type  of  response.  Essay  tests , in  con- 
trast, require  the  examinee  to  recall  appropriate  material, 
organize  it,  and  present  the  answer  in  a clear,  concise 
manner.  This  type  of  test  is  more  difficult  to  evaluate 
objectively,  and  it  is  important  that  essay  questions  be 
stated  so  that  the  form  and  scope  of  the  response  is  clear 
both  to  the  examinee  and  to  the  examiner  (Theobald,  1974; 
Scannell  & Tracy,  1975) . 

True-false  items  on  a test  consist  of  a statement 
which  examinees  judge  to  be  either  true  or  false.  This 
type  of  item  is  compact,  requires  a small  amount  of  space, 
and  a minimum  of  reading  and  response  time  by  the  examiner. 
It  is  also  easily  composed  by  the  test-maker.  Matching 
items  usually  consist  of  a set  of  terms,  names,  incomplete 
statements,  phrases,  or  definitions,  called  stimuli;  and  a 
set  of  names,  terms,  definitions,  or  pictures  called  re- 
sponses. They  provide  a compact  way  to  test  a number  of 
concepts  simultaneously.  Multiple  choice  items  consist  of 
a stem  in  the  form  of  a question  or  incomplete  sentence 
followed  by  a series  of  possible  answers,  one  of  which  is 
correct  or  better  than  the  others.  Multiple  choice  items 
can  be  used  to  measure  almost  all  achievements  that  can  be 
measured  by  means  of  pencil  and  paper  tests.  These  last 
three  types  of  items  are  usually  termed  objective  items. 


referring  only  to  objectivity  of  scoring.  Since  only 


response  is  designated  as  the  correct  one,  scoring  is  simple 
and  efficient  in  terms  of  time  spent  on  scoring. 

Uses  of  classroom  tests.  Scannell  and  Tracy  (1975) 
suggested  that  the  role  of  classroom  measurement  can  be 
divided  into  two  major  parts  running  along  one  continuum: 
the  role  of  analysis  during  the  learning  process  and  the 
role  of  assessing  the  level  of  achievement  at  the  end  of  a 
semester  or  of  an  academic  year.  The  first  role  can  be 
termed  formative  evaluation  and  the  second  summative 
evaluation.  In  addition,  evaluation  should  occur  at  the 
beginning,  middle,  and  end  of  a unit  of  study.  The  purpose 
will  be  different  at  each  of  these  stages  and  will  affect 
subsequent  decisions  and  actions  differentially.  Before 
instruction  begins,  evaluation  of  pupils'  current  level  of 
achievement  will  determine  objectives,  content,  and 
planning  of  instruction.  Thus,  testing  can  serve  as  a tool 
for  diagnosis  (Thorndike  & Hagen,  1969) . If  students  have 
not  mastered  prerequisite  skills,  the  instruction  will  con- 
sist of  review  and  remedial  work  before  moving  on.  Some- 
times it  may  be  ascertained  that  students  have  already 
developed  the  skills  included  in  the  plan  for  teaching,  and 
the  material  can  be  eliminated  altogether.  During  the 
course  of  study,  the  teacher  may  observe  that  some  students 
are  not  making  satisfactory  progress.  Periodic  testing  and 
further  diagnosis  can  provide  feedback  to  both  teacher  and 


12 

student  and  may  serve  to  alter  the  teaching- learning  plan. 
Finally,  at  the  end  of  a unit  the  teacher  needs  to  know  how 
successful  the  instructional  unit  has  been  in  helping  stu- 
dents achieve  the  goals  and  objectives  previously  laid  out. 
Again,  a well-constructed  test  can  provide  this  information. 

In  addition,  both  teachers  and  students  tend  to  view 
tests  as  having  a strong  motivating  function  (Thorndike  S 
Hagen,  1969) . They  determine  to  some  degree  when  students 
study,  what  they  study,  and  how  they  study. 

Tests  provide  considerable  information  upon  which  to 
base  decisions  and  judgments  as  to  future  educational 
planning  for  students.  For  example,  decisions  about 
whether  to  allow  students  to  pursue  certain  courses  in  high 
school,  whether  to  consider  higher  education,  or  which 
occupations  to  consider  may  be  based  in  large  measure  upon 
information  gathered  from  tests. 

Standardized  Tests 

In  contrast  to  classroom  tests,  standardized  tests 
tend  to  focus  on  broader,  more  general  kinds  of  information 
than  would  be  included  among  the  educational  objectives  of 
most  school  systems.  They  span  a much  wider  range  of  con- 
tent and  are  administered  much  less  frequently  than  teacher- 
made  tests.  In  addition,  the  norms  provided  by  standard- 
ized tests  make  it  possible  for  an  individual 1 s scores  to 
be  compared  with  the  scores  of  other  individuals  across  the 
nation.  Such  comparison  is  necessary  for  purposes  such  as 


quality  control,  curricular  evaluation,  counseling,  and 
identifying  exceptional  children  (Stanley  & Hopkins,  1972). 


Standardized  tests  are  given  with  prescribed  directions, 
time  limits,  and  other  controls,  providing  more  meaningful 
bases  for  comparing  performance.  Standardized  tests  are 
carefully  developed  and  refined  by  item  analysis  so  that 
the  reliability  of  most  standardized  tests  is  greater  than 
that  of  teacher-made  tests.  Thus,  standardized  tests  serve 
a different  purpose  than  classroom  tests  and  provide  a 
different  type  of  information.  Both  are  useful  to  the 
educator  in  evaluating  pupil  progress.  This  discussion  of 
standardized  tests  will  include  a brief  history,  the  kinds 
of  standardized  tests  now  in  use,  and  their  purpose  and  use 
in  education  today. 

History ■ As  has  been  mentioned  earlier,  appraisal  of 
educational  achievement  in  the  United  states  before  1850 
consisted  of  oral  examination.  After  the  advent  of  inex- 
pensive paper  and  pencils  in  the  mid-nineteenth  century, 
written  tests  became  more  popular.  The  written  examination 
was  more  efficient  than  the  oral  examination  in  that  it 
presented  the  same  task  to  each  member  of  the  group,  and  it 
allowed  each  pupil  to  work  for  the  full  examination  period 
(Thorndike  & Hagen,  1969).  However,  evaluation  of  test 
items  was  still  highly  subjective. 

The  first  steps  toward  the  scientific  use  of  measure- 
ment in  education  were  made  by  Horace  Mann,  who  had  been 


Secretary  of 


Massachusetts  Stc 


appointed 

Education  prior  to  1845  (Stanley  & Hopkins,  1972;  Noll  & 
Scannell,  1972) . As  an  outcome  of  his  observations  of 
weaknesses  in  the  public  schools,  written  examination  ques- 
tions in  history,  arithmetic,  geography,  definitions, 
grammar,  natural  philosophy,  and  astronomy  were  written  to 
be  answered  by  the  pupils.  This  was  the  first  recorded 
instance  of  giving  the  same  written  examination  to  a sample 
of  all  pupils  in  a school  system.  The  results  revealed 

many  of  the  pupils,  dramatically  pointing  out  the  weakness 

In  England,  a few  years  later,  a schoolmaster  named 
George  Fisher  prepared  a scale  of  handwriting  against  which 
samples  of  children's  handwriting  could  be  compared,  along 
with  a standard  list  of  spelling  words  and  questions  in 

Following  these  beginnings,  other  important  advances 
were  made  in  the  development  of  measurement  in  education. 
Sir  Francis  Galton  demonstrated,  by  using  "ingenious  tests 
and  by  statistical  methods"  (Noll  & Scannell,  1972,  p.  22) 
that  individuals  vary  in  physical,  sensory-motor,  and  per- 
sonality traits.  James  McKeen  Cattell,  an  American  psycho- 
logist, became  interested  in  the  problem 


of  indi\ 


differences  and  conducted  a number  of  experiments  in  sen- 
sory-motor abilities.  He  believed  that  such  differences 
would  reflect  differences  in  intelligence.  In  1895.  J.  M. 
Rice  devised  a series  of  exercises  to  test  spelling  ability 
among  students  in  the  Massachusetts  schools.  He  found 
great  variation  from  class  to  class,  school  to  school,  and 
city  to  city.  He  conducted  similar  studies  in  the  areas  of 
arithmetic  and  language. 

The  work  of  these  early  researchers  was  continued  at 
the  beginning  of  the  twentieth  century  by  men  like  Edward 
L.  Thorndike,  who  wrote  the  first  book  on  statistical 
methods,  and  Alfred  Binet,  who  devised  the  first  successful 
intelligence  test  (Noll  & Scannell,  1972) . At  the  same  time 
attempts  toward  measurements  of  emotion,  interests,  and 
personality  were  being  made. 

When  the  United  States  entered  World  War  I in  1917 , it 
soon  became  apparent  that  the  then  current  methods  of 
classifying  men  were  inadequate.  Out  of  the  need  for  more 
efficient  procedures  came  the  Army  Alpha,  the  first  group 
intelligence  test.  This  was  followed  by  the  Army  Beta,  to 
be  used  for  those  with  less  than  sixth  grade  reading 
ability.  After  the  war,  these  tests  were  released  for 
general  use  and  administered  to  thousands  of  high  school 
and  college  students.  Within  a few  years,  numerous  other 
group  intelligence  tests  appeared,  as  well  as  survey  tests, 
personality  tests,  and  aptitude  tests. 


tests  became  widespread 


In  World  War  II,  the  use  of 
all  of  the  armed  forces.  New  development  was  seen  in  the 
devising  of  aptitude  tests  for  placement  of  personnel  in 
areas  such  as  navigation,  radio,  and  radar.  The  use  of 
clinical  instruments  such  as  the  Rorschach  was  stimulated. 
At  the  same  time,  as  more  adolescents  entered  secondary 
school  and  the  curriculum  expanded  to  include  vocational 
training,  there  was  an  increased  use  of  tests  and  other 
measuring  instruments. 

Since  World  War  II,  the  development  and  use  of  stan- 
dardized tests  have  steadily  increased.  Government  legis- 
lation has  brought  about  wide-scale  testing  to  identify 
students  with  outstanding  abilities,  as  well  as  those  who 
are  educationally  deprived.  The  testing  movement  appears 
to  have  reached  its  zenith  in  our  modern  educational  sys- 
tems, where  test  scores  are  used  as  the  basis  for  most  of 
the  important  decisions  concerning  students'  educational 
and  vocational  choices. 

Kinds  of  standardized  tests.  Standardized  tests  can 
be  classified  in  various  ways.  For  example,  they  can  be 
classified  on  the  basis  of  administrative  procedures,  such 
as  individual  versus  group.  Usually  they  are  classified  as 
to  what  is  being  measured.  On  this  basis,  they  can  be 
categorized  as  aptitude  tests,  achievement  tests,  and 
interest,  personality,  and  attitude  inventories  (Mehrens  & 


Lehmann,  1972) . 


17 

The  first  category,  aptitude  tests,  includes  measures 
of  scholastic  aptitude,  or  intelligence  tests,  and  aptitude 
tests  which  measure  capacity  in  specific  fields  such  as 
music  or  mechanics.  The  results  of  aptitude  tests  are  used 
as  predictors  of  some  future  performance  by  an  individual. 

Numerous  attempts  have  been  made  to  define  intelli- 
gence. However,  educators  and  psychologists  have  never 
been  able  to  agree  to  any  one  definition  (Noll  £ Scannell, 
1972 ; Stanley  £ Hopkins,  1972).  Stanley  and  Hopkins  (1972) 
referred  to  a series  of  articles  published  in  1921  in  the 
Journal  of  Educational  Psychology  by  fourteen  prominent 
psychologists,  each  of  whom  was  asked  to  present  his  own 
conception  of  the  nature  of  intelligence.  Although  there 
was  some  agreement,  fourteen  different  ideas  of  intelli- 
gence emerged.  Some  saw  it  as  the  degree  of  adaptation  to 
the  environment;  some  as  the  ability  to  learn,  the  ability 
to  think  abstractly;  or  the  degree  of  past  learning.  The 
definition  that  finally  evolved  was  that  intelligence  is 
what  intelligence  tests  measure  and  that  it  is  what  deter- 
mines success  in  an  academic  setting.  At  any  rate,  what 
any  given  intelligence  test  measures  depends  on  how  its 
author (s)  view  the  nature  of  intelligence.  In  recognition 
that  intelligence  is  multi-faceted,  most  intelligence  tests 
include  both  verbal  and  non-verbal  measures,  whether  they 
be  individual  or  group  tests.  They  are  used  as  predictors 
of  success  in  school  and  college. 


18 


The  tests  which  are  commonly  labeled  aptitude  tests 
are  those  designed  to  measure  skills  and  knowledge  which  are 
necessary  for  successful  performance  in  specific  fields. 
These  can  be  subdivided  into  tests  designed  for  vocational 
guidance,  those  designed  for  prognosis  and  prediction  in 
special  school  subjects  and  in  special  types  of  schools, 
and  those  designed  to  assess  creativity  (Thorndike  & Hagen, 

Psychologists  came  to  realize  the  need  for  some  basis 
upon  which  to  guide  young  people  into  the  types  of  work 
they  were  most  suited  for  and  in  which  they  would  be  most 
satisfied  and  efficient.  As  they  began  to  study  different 
jobs,  they  noted  that  different  ones  required  different 
abilities  and  different  levels  of  mental  ability.  At  the 
same  time,  research  showed  that  human  abilities  are 
specialized  to  some  degree.  As  a result  of  further  re- 
search on  the  nature  of  abilities  and  on  the  validity  of 
specific  tests  for  specific  jobs,  psychologists  designed 
numerous  aptitude  test  batteries  to  be  used  in  educational 
and  vocational  guidance  and  in  personnel  selection  and 
classification.  Two  of  the  best  known  of  these  are  The 
Differential  Aptitude  Test  Battery  and  the  General  Aptitude 
Test  Battery. 

These  aptitude  tests  designed  to  predict  readiness  to 
learn  or  probability  of  success  in  some  specific  area  of 
education  are  called  prognostic  tests.  For  example. 


reading  readiness  tests  are  widely  used  with  children  enter- 
ing school  for  the  first  time  to  give  the  school  some  idea 
of  the  child's  ability  to  learn  to  read.  Prom  this  infor- 
mation, teachers  can  organize  a reading  program  geared  to 
the  ability  levels  of  the  children.  Other  aptitude  tests 
have  been  developed  for  the  purpose  of  selecting  individu- 
als for  certain  types  of  professional  training,  such  as 
medicine,  engineering,  and  musical  or  artistic  training. 

Still  other  aptitude  tests  seek  to  measure  creativity. 
Such  tests  attempt  to  measure  divergent  thinking,  or  the 
ability  to  devise  multiple  responses  to  a problem.  These 
tests  are  still  being  studied  and  have  not  as  yet  achieved 
wide  utilization  either  in  schools  or  in  industry. 

Standardized  achievement  tests,  in  contrast  to  apti- 
tude tests,  are  designed  to  measure  what  a student  has 
learned,  rather  than  what  he  or  she  is  capable  of  learning. 
They  register  the  present  degree  of  learning  or  achievement 
after  instruction  has  been  given.  There  are  two  types  of 
standardized  achievement  tests:  survey  and  diagnostic. 

The  survey  battery  is  most  often  used  when  an  overall 
measure  of  achievement  in  the  most  common  subjects  of 
instruction  is  needed  for  purposes  of  grade  placement,  or 
grouping.  It  is  a convenient  instrument  for  examining  the 
general  strengths  and  weaknesses  of  individual  pupils.  It 
is  well  adapted  for  comparing  a pupil's  achievement  in 
different  areas  because  the  norms  for  all  of  the  subtests 
are  based  on  the  same  population  sample. 


20 


Whereas  a survey  achievement  test  tries  to  provide  an 
overall  appraisal  of  achievement,  a diagnostic  test  pro- 
vides a much  more  detailed  picture  of  strengths  and  weak- 
nesses in  a particular  area.  This  more  detailed  analysis 
can  suggest  causes  for  deficiencies  and  provide  a guide  for 
remediation.  The  two  types  of  achievement  tests  are  used 
for  different  purposes.  The  survey  test  provides  tasks  of 
mostly  moderate  difficulty  to  produce  a spread  of  scores 
for  the  whole  range  of  pupils;  the  diagnostic  test,  on  the 
other  hand,  should  be  easy  for  most  pupils  and  should 
identify  the  few  at  the  bottom  who  have  special  difficulty 
with  a specific  subskill  (Thorndike  S Hagen,  1969).  The 
two  complement  each  other,  and  in  many  instances  it  is 
desirable  to  use  both  in  evaluating  the  status  of  a parti- 
cular student. 

The  third  category  of  standardized  tests — interest, 
personality,  and  attitude  inventories — are  mostly  self- 
report  measures.  These  are  not  usually  administered  or 
interpreted  by  the  classroom  teacher  and  therefore  will  be 
only  briefly  discussed  here. 

Self-report  interest  inventories  are  mainly  used  by 
high  school  and  college  students  who  want  to  know  more 
about  themselves  so  that  they  can  make  better  decisions 
about  their  educational  and  vocational  future.  Interest 
inventories  help  answer  such  questions  as  which  vocational 
activities  are  most  appealing  to  a person,  which  interests 


related 


are  stronger,  and  which  careers  are 
interests.  They  are  not  designed  to  predict  job  success, 
academic  success,  job  satisfaction,  or  personality  adjust- 

Most  personality  tests  currently  in  use  are  of  the 
pencil  and  paper,  self-report  variety,  in  which  the 
examinee  is  presented  with  a series  of  questions  depicting 
typical  behavior  patterns.  Some  of  these  tests  measure 
only  one  trait  dimension,  whereas  others  measure  several 
traits  at  once.  Scores  consist  of  the  number  of  items 
answered  by  a person  in  the  direction  which  supposedly 
displays  those  traits.  Other  types  of  standardized  per- 
sonality assessment  devices  are  problem  checklists,  general 
adjustment  inventories,  and  unstructured  inventories  such 
as  projective  tests.  There  are  inherent  problems  in  per- 
sonality inventories.  For  example,  individuals  can  change 
within  a short  period  of  time  so  that  any  given  score  may 
be  valid  only  for  the  time  that  the  test  was  administered. 
The  "fakeability"  of  such  inventories  can  also  pose  a 
problem.  However,  in  spite  of  their  shortcomings,  these 
tests  do  provide  a more  complete  understanding  of  an  indi- 
vidual. In  addition,  they  give  the  individual  an  oppor- 
tunity to  express  and  discuss  feelings. 

Students'  attitudes  toward  education  play  a signifi- 
cant role  in  determining  how  they  function  in  the  classroom. 
Attitude  scales  are  useful  in  identifying  problem  attitudes 


keep  students  from  progressing  as  much  as  they 


are  capable.  However,  attitude  inventories  usually  do  not 
have  norms  and  thus  are  less  valid  and  reliable  than  other 
measures.  Like  personality  inventories,  they  can  be  help- 
ful only  if  the  subject  has  been  truthful. 

Uses  of  standardized  tests.  The  value  of  a testing 
program,  in  the  final  analysis,  rests  on  the  uses  made  of 
the  results.  It  can  be  said  that  the  main  function  of 
standardized  tests  is  to  help  in  making  educational  deci- 
sions which  can  be  categorized  under  three  headings: 
instructional,  guidance,  and  administrative  (Mehrens  6 
Lehmann,  1975;  Thorndike  & Hagen,  1969) . Each  of  these 
categories  will  be  considered  in  this  discussion. 

Scholastic  aptitude,  or  intelligence  tests,  can  be 
used  by  the  classroom  teacher  as  an  aid  in  understanding 
individual  pupils  in  the  class  and  in  providing  materials 
and  learning  experiences  which  will  be  beneficial  to  those 
pupils.  Aptitude  tests,  correctly  used,  can  help  the 
teacher  to  develop  realistic  expectations  for  individual 
pupils:  whether  the  pupil  may  be  expected  to  progress  as 

fast  as  others  in  the  class,  whether  the  pupil  may  need 
remediation,  or  whether  the  pupil  should  be  provided  with 
special  enrichment  activities  in  addition  to  regular  class- 
room work.  Knowing  something  about  students’  ability 
level  can  also  help  the  teacher  determine  whether 


the  students 


learning 


would  be  expected 


their  ability  level. 

Aptitude  tests  can  be  useful  as  sources  of  information 
in  educational,  vocational,  and  personal  counseling.  In 
educational  guidance  this  information  is  important  in  help- 
ing students  decide  on  educational  objectives  such  as 
whether  to  plan  for  college  or  what  type  of  high  school 
curriculum  to  select.  General  aptitude  tests  often  provide 
information  on  how  to  deal  with  problem  children  where  the 
problems  may  stem  from  a program  that  is  below  or  above  the 
child's  academic  potential.  In  vocational  counseling,  these 
scores  are  useful  also  because  of  differing  educational  re- 
quirements for  different  occupations.  Special  aptitude 
tests  can  be  used  as  supplements  to  the  general  intelli- 

Ways  in  which  aptitude  tests  can  be  used  administra- 
tively include  using  test  results  as  one  basis  for  forming 
classroom  groups  and  in  some  selection,  classification,  and 
placement  decisions,  such  as  which  students  are  eligible 
for  placement  in  special  programs.  Another  possible  ad- 
ministrative use  of  aptitude  tests  is  for  relevant  supple- 
mentary information  to  be  used  in  curriculum  planning  and 
evaluation  on  a schoolwide  basis  (Mehrens  t Lehmann,  1975) . 

Achievement  tests  can  be  valuable  to  the  classroom 
teacher  in  planning  instruction  for  individual  students  and 
for  the  class  as  a whole.  One  widespread  use  of 


achievement 


is  for  diagnostic  purposes.  Scores 


can  be  used  to  identify  individual  students  who  need  more 
intensive  study.  Also  a teacher  can  inspect  an  individual 
set  of  answers  in  order  to  get  cues  about  the  nature  of  the 
difficulties  the  student  is  experiencing  and  thus  plan  more 
effectively  for  remedial  instruction.  Similarly,  test 
scores  can  help  identify  those  students  who  need  enrichment 
activities  to  keep  them  from  being  bored  with  the  regular 
course  of  study.  Achievement  test  batteries  also  help  the 
classroom  teacher  to  identify  the  strengths  and  weaknesses 
of  the  class  as  a whole  and  to  recognize  the  range  of 
achievement  levels  that  he  or  she  must  handle.  This  may 
lead  to  modification  of  the  teaching-learning  plan.  In 
addition,  evaluation  of  a specific  teaching  method  and 
appraisal  of  progress  can  be  achieved  by  means  of  standard- 
ized test  scores. 

Achievement  test  scores  can  be  of  help  in  counseling 
students  about  educational  and  vocational  decisions. 
Counseling  may  consist  of  talking  to  an  elementary  school 
student  about  strengths  and  weaknesses  or  helping  a high 
school  senior  to  select  a college  major.  Achievement  test 
scores  are  also  being  used  to  help  students  to  select  a 
college  (Mehrens  £ Lehmann,  1975) . Information  provided  by 
such  tests  as  the  ACT  and  CEEB  can  enable  the  high  school 
counselor  to  predict  success  in  college  from  test  scores. 
Thus,  students  can  be  guided  to  institutions  where  they 
might  be  most  likely  to  succeed. 


Administrative  uses  of  standardized  achievement 


scores  are  similar  to  those  of  aptitude  test  scores. 
Achievement  test  scores  are  also  used  as  a basis  for  form- 
ing classroom  and  instructional  groups  and  in  selection, 
placement,  and  classification  of  students  into  special  edu- 
cational programs.  In  addition,  school  administrators  may 
make  use  of  standardized  achievement  tests  to  help  make 
educational  decisions  regarding  the  effectiveness  of  the 
school's  curriculum.  Test  results,  along  with  other  data, 
may  indicate  a need  for  revision  of  the  current  curriculum 
or  adoption  of  a new  one.  In  some  instances  test  scores 
are  used  as  a basis  for  evaluating  teachers.  This  practice 
has  been  seriously  questioned  (Thorndike  & Hagen,  1969; 
Mehrens  & Lehmann,  1975) . It  fails  to  take  into  account  a 
number  of  important  considerations.  However,  debating  the 
issue  is  not  within  the  scope  of  this  discussion. 

Attitude  scales  are  useful  to  the  classroom  teacher 
and/or  the  school  counselor  as  one  more  source  of  informa- 
tion to  be  used  to  better  understand  students  and  their 
motives.  Better  understanding  leads  to  better  rapport  with 
students,  which  in  turn  enhances  the  learning  environment. 

Pitfalls  of  standardized  testing.  Various  functions 
of  standardized  test  results  have  been  identified  and  dis- 
cussed. Some  cautions  are  also  in  order.  The  intelligence 
test,  for  example,  is  generally  a measure  of  ability  to 
work  with  symbols,  abstract  ideas,  and  their  relationships. 


intelligence  test  may 


possess  skills  not  measured  by  the  test  but  may  have 
ability  in  other  areas  such  as  music,  art,  or  mechanics. 
Standardized  tests  may  discriminate  against  pupils  with 
poor  verbal  ability,  with  different  cultural  backgrounds, 
with  emotional  problems,  or  with  certain  learning  handicaps. 
Any  test  score  should  be  considered  as  only  one  piece  of 
information  about  a child  on  a given  day  under  a given  set 

other  information  concerning  the  child.  When  interpreted 
and  used  with  care  and  with  flexibility,  test  scores  can 
provide  valuable  information  that  cannot  be  obtained  in  any 


Self-Concept 


The  importance  of  the  self-concept  as  a factor  in  aca- 
demic achievement  is  generally  recognized  (Beane,  1982; 
Leviton,  1975;  Melvin,  1982;  Mitchell  & McCollum,  1983). 
Self-concept,  or  self-image,  can  be  defined  as  "what  indi- 
viduals think,  see,  believe,  and  feel  about  themselves" 
(Melvin,  1982) . When  children  are  accepted,  approved, 
respected,  and  liked  for  what  they  are,  they  are  able  to 
develop  an  attitude  of  self-acceptance,  which  frees  them 
to  function  at  a level  commensurate  with  their  abilities. 
This  generalizes  to  the  testing  situation,  where  how  chil- 
dren feel  about  themselves  influences  test  behavior 
(Kirkland,  1971).  If  they  feel  that  they  can  and  will  do 
well  on  a test,  there  is  a strong  likelihood  that  they  will 
perform  in  that  manner.  Conversely,  if  they  expect  to 
perform  poorly,  the  reverse  will  be  likely  to  occur. 

As  cited  by  Leviton  (1975) , in  early  studies  investi- 
gating the  relationship  between  self-concept  and  academic 
achievement,  achievement  was  measured  in  terms  of  grade 
point  averages,  which  introduced  the  problem  of  subjectiv- 
ity into  the  results.  He  suggested  that  more  objective 
criterion  measures,  such  as  standardized  achievement  tests, 
should  be  used. 

Later  studies  attempted  to  remedy  this  limitation.  In 
a study  with  sixth  grade  students,  Williams  and  Cole  (1968) 
utilized  the  Tennessee  Self-Concept  Scale  as  an  objective 


measure  of  self-esteem  and  the  reading  and  arithmetic  sec- 
tions of  the  California  Achievement  Test  as  measures  of 
achievement.  They  found  a significant  correlation  between 
self-concept  and  reading  and  arithmetic  achievement. 

Primavera,  Simon,  and  Primavera  (1974)  investigated 
the  relationship  between  academic  achievement  and  self- 
esteem with  reference  to  possible  sex  differences.  The 
Coopersmith  Self-Esteem  Inventory  was  administered  to  fifth 
and  sixth  grade  students  from  a Catholic  school.  Subtests 
from  the  Stanford  Achievement  Test,  as  well  as  appropriate 
grade  levels  of  the  mathematics  and  reading  tests  for  New 
York  State  Elementary  Schools,  were  used  as  measures  of 
academic  achievement.  Significant  correlations  between 
self-esteem  and  achievement  test  scores  were  found  for  both 
the  total  and  female  groups.  Only  the  correlation  between 
self-esteem  and  the  New  York  State  Mathematics  Test  was 
significant  for  the  male  group. 

Simon  and  Simon  (1975)  explored  the  relationship  be- 
tween self-esteem  and  standardized  academic  achievement  and 
between  self-esteem  and  intelligence.  The  Coopersmith  Self- 
Esteem  Inventory,  the  SKA  Achievement  Series,  and  the  Lorge- 
Thorndike  Intelligence  Tests,  administered  to  fifth  graders 
in  a New  York  public  school,  were  used  to  measure,  respec- 
tively, self-esteem,  achievement,  and  intelligence.  Sig- 
nificant correlations  were  found  between  self-esteem  and 


29 


19741 


30 


Verbal  and  social  reinforcement.  Ferrell  et  al. 

(1980)  investigated  the  effects  of  race  of  examiner  and 
type  of  reinforcement  upon  the  WISC-R  performance  of  a 
group  of  lower  class  black  children.  Children  were  -randomly 
assigned  to  either  a black  examiner  or  to  a white  examiner 
to  form  two  groups.  Within  each  group  an  equal  number  of 
subjects  were  assigned  to  a nonreinforcement  group,  a candy 
reward  group,  a traditional  social  reward  group,  or  a 
culturally  relevant  social  reward  group.  In  the  latter, 
the  verbal  reinforcement  consisted  of  phrases  such  as  "nice 
job,  young  blood"  and  "good  work,  little  brother,"  as 
opposed  to  the  traditional  "good"  or  “fine."  Tangible 
reinforcement  was  most  effective  with  the  white  examiner, 
with  the  culturally  relevant  verbal  encouragement  the  next 
most  effective  technique.  However,  with  the  black  examiner 
tangible  reinforcement  and  culturally  relevant  verbal  rein- 
forcement were  equally  effective. 

Adelman  and  Chaney  (1982)  found  verbal  encouragement 
to  be  effective  in  enhancing  motivation  to  perform  on  the 
coding  subtest  of  the  WISC-R  with  a group  of  children  who 
exhibited  learning  and  behavior  problems.  However,  the 
same  experimental  procedure  had  no  effect  on  a group  of 
nonproblem  children. 

Tangible  reinforcement.  Ayllon  and  Kelly  (1972)  found 
that  giving  tokens  for  reinforcers  for  each  correct  response 


significantly  raised  scores 


the  Metropolitan  Readiness 


Test  for  a group  of  trainable  retardates.  In  a second  ex- 
periment as  part  of  the  same  study,  working  with  normal 
children  and  using  token  reinforcement  for  correct 
responses,  scores  on  the  Metropolitan  Achievement  Test  were 
raised  significantly.  Edlund  (1972)  worked  with  young 
children  from  low-middle  class  and  lower-class  homes,  using 
candy  to  reward  each  correct  response  on  the  Stanford  Binet. 
Subjects  receiving  reinforcers  had  much  higher  rates  of 
correct  responses,  and  thus  higher  IQ  scores,  than  those 
who  were  not  reinforced.  Taylor  and  White  (1982) , working 
with  second  grade  students  in  twelve  Title  I schools,  found 
that  using  money  as  a reinforcer  was  more  effective  than 
either  training  students  in  test-wiseness  or  training 
teachers  in  appropriate  test  administration  techniques. 

Feedback ■ Bridgeman  (1974)  designed  a study  to  assess 
the  immediate  effects  of  success  feedback,  failure  feedback, 
or  no  feedback  on  a power  test  of  academic  aptitude  given 
to  seventh  grade  students.  Items  for  the  pretest  and  post- 
test were  taken  from  the  nonverbal  portion  of  the  Lorge- 
Thorndike  Intelligence  Test.  Two  days  after  the  pretest, 
the  posttest  was  administered.  Just  preceding  the  taking 
of  the  posttest,  students  were  given  their  "score”  on  the 
pretest,  which  was,  in  fact,  a randomly  assigned  score  of 
96,  53,  or  no  score.  On  the  accompanying  interpretation 
sheet,  the  score  90  or  above  was  designated  as  excellent, 

59  and  below  was  designated  as  poor,  and  those  receiving 


33 


these  studies.  Anastasi  (1976,  p.  34)  advised  following 
standardization  procedures  "to  the  minutest  detail"  and 
suggested  that  children  given  a prize  whenever  they  give  a 
correct  response  to  a test  item  cannot  be  directly  compared 
with  children  who  received  only  the  standard  verbal  en- 
couragement. O'Connor  and  Weiss  (1974)  discussed  the 
matter  of  using  contingent  reinforcement  to  raise  test 
scores.  They  pointed  out  that  if  populations  from  which 
samples  are  drawn  are  identical  with  regard  to  degree  of 
motivational  deficit  in  test-taking  situations,  then  using 
contingent  reinforcement  would  simply  cause  an  upward  shift 
in  the  distribution  of  scores,  and  examinees  would  maintain 
the  same  relative  position  in  comparison  with  other  exami- 
nees. Only  if  the  sample  populations  showed  differential 
motivational  deficits  could  the  use  of  contingent  reinforce- 
ment procedures  reduce  the  error  variance.  Obviously, 
children  who  are  already  highly  motivated  do  not  need  such 
procedures,  so  the  practicality  of  working  with  hetero- 
geneous groups  comes  into  question. 

Testing  Conditions 

The  two  conditions  most  often  mentioned  as  affecting 
test  performance  are  the  physical  environment  and  charac- 
teristics of  the  examiner.  Both  factors  should  be  con- 
sidered when  standardized  tests  are  administered. 


34 


Physical  environment.  In  order  to  provide  optimal 
conditions  for  testing,  the  testing  room  itself  should 
provide  adequate  lighting,  ventilation,  seating  facilities, 
and  freedom  from  noise  and  distraction  (Anastasi,  1976). 

affect  test  scores  (Seitz,  Abelson,  Levine  s Zigler,  1975). 

The  examiner.  A few  studies  have  investigated  the 
effect  of  examiner  characteristics  on  test  scores  and  have 
found  that  a flexible,  reinforcing  approach  versus  a rigid 
methodical  procedure  may  affect  examinees'  responses 
(Exener,  1966) , and  that  the  race  of  the  examiner  can 
affect  IQ  scores  (Ferrell  et  al.,  1980 j Mishra,  1980; 
Watson,  1972) . However,  Irons  (1981)  found  no  significant 

WISC-R  Verbal,  Performance,  or  Full  Scale  scores. 

The  importance  of  the  examiner-examinee  relationship 
has  been  discussed  in  the  literature  (e.g.,  Anastasi,  1976; 
Kirkland,  1971;  Sarnacki,  1979).  Sarnacki  (1979)  suggested 
that  the  characteristics  of  the  examiner  could  be  one  of 
the  correlates  of  test-wiseness , since  the  test-giver  is 
the  primary  stimulus  cue  in  a testing  situation.  He  fur- 
ther stated  that  the  examiner's  behavior  and  replies  to 

It  appears  that  further  research  in  this  area  is 


sly  fev 


that  have  been  done  is  that  the  examiner  plays  a signifi- 
cant role  in  the  test-taking  situation,  and  that  the 
examiner-examinee  relationship  is  another  variable  that 
holds  potential  for  affecting  test  scores. 

Test  Anxiety 

High  test-anxious  students  typically  score  lower  on 
tests  than  do  low  test-anxious  students  (Anastasi,  1976; 
Kirkland,  1971;  Petty  & Harrell,  1977;  Wine,  1971). 
Spielberger  et  al.  (1978)  defined  test  anxiety  as  the  ten- 
dency of  people  to  respond  to  nervousness  induced  by  test- 
ing situations  with  worried,  negative,  self-centered 
thoughts  and  statements.  Thus,  the  person  who  is  highly 
test-anxious  comes  to  the  testing  session  experiencing 
intense  emotions  and  finds  it  difficult  to  focus  adequate 
attention  to  the  task  at  hand  (Clawson,  Firment, s Trower, 
1981;  Wine,  1971). 

Since  the  subject  of  test  anxiety  will  be  discussed  in 
detail  in  a later  section,  it  is  merely  mentioned  here  as 
an  effector  of  test  performance.  However,  a few  generali- 
zations can  be  made. 

1.  In  general,  there  is  a negative  relationship  be- 
tween level  of  ability  and  test  anxiety.  Poorer  students 
are  usually  most  anxious  when  facing  a test. 

2.  There  is  a positive  correlation  between  level  of 
aspiration  and  test  anxiety.  The  more  important  the  test 
is  to  the  individual,  the  more  anxiety  is  experienced. 


3.  Subjects 


differently  to  anxiety,  with 


being  more  immobilized  than  others. 

Test-Wiseness 

Test-wiseness  has  been  identified  as  a source  of  error 
variance  in  standardized  test  scores  (Ebel,  1965;  Erickson, 
1972;  Fueyo,  1977;  Millman,  Bishop,  & Ebel,  1965).  Possibly 
great  numbers  of  students  perform  poorly  on  standardized 
tests  because  they  are  deficient  in  test-taking  abilities 
and  not  because  of  lack  of  knowledge. 

It  has  been  demonstrated  that  teaching  test-wiseness 
skills  can  result  in  improved  scores  on  standardized  tests 
(e.g.,  Jongsma  S Warshauer,  1975;  Oakland,  1972;  Woodley, 
1972).  Furthermore,  it  has  been  shown  that  instruction  in 
test-taking  strategies  is  beneficial  to  students  at  all 
ability  levels  (Dunn  S Goldstein,  1959;  Diamond  6 Evans, 
1972) . 

This  review  focuses  mainly  on  the  effect  of  instruc- 
tion in  test-taking  strategies  on  standardized  test  scores 
and  on  test  anxiety,  which  is  known  to  be  an  effector  of 
standardized  test  scores.  It  is  the  belief  of  the  investi- 
gator that  test-wiseness  can  also  have  an  effect  on  the 
other  effectors  presented  in  this  section — self-concept, 
motivation,  and  examiner-examinee  relationship — as  well, 
although  no  attempt  to  measure  these  effects  will  be  made. 

A detailed  review  of  test-wiseness  research  and  findings 
will  be  presented  in  another  section. 


cical 


37 


The  phenomenon  of  test  anxiety  has  probably  existed 
for  as  long  as  tests  have  been  used  for  the  purpose  of 
evaluating  the  performance  of  individuals  (Tryon,  1980). 
Spielberger  et  al.  (1978) , in  providing  a historical  over- 
view of  research  on  the  effects  of  examination  stress,  cited 
studies  which  investigated  the  nature  of  test  anxiety  as 
early  as  1914.  In  the  early  studies  attention  was  mainly 
focused  on  the  physiological  changes  which  occurred  upon 
emotional  arousal  in  test  situations  rather  than  on  indivi- 
dual differences  in  test  anxiety  as  a personality  trait. 

German  investigators  between  1932  and  1937  (Spielberger  et 
al.,  1978),  who  conceptualized  it  in  psychoanalytic  terms 


and  attributed  it  to  trauma  in  childhood.  These  studies 
were  never  translated,  so  have  received  little  attention  in 
the  literature.  Studies  by  Brown  and  his  colleagues  be- 
tween 1938  and  1949  (Spielberger  et  al.,  1978)  resulted  in 
the  first  questionnaire  for  identifying  test-anxious  stu- 
dents and  found  that  questions  dealing  with  feelings  of 
nervousness  and  being  worried  were  most  highly  correlated 
with  scores  on  the  scale.  They  concluded  that  students  who 


38 


The  study  of  test  anxiety  began  in  earnest  with 
Handler  and  S.  Sarason's  (1952,  1953)  investigations  of 

contributed  to  the  understanding  of  the  nature  of  test 

Self-preoccupation . Although,  as  has  been  noted  in 
the  preceding  section,  there  was  considerable  interest  in 
evaluating  stress  in  the  1930s  and  1940s,  Seymour  Sarason 
and  George  Mandler  (Mandler  & Sarason,  1952,  1953)  are 
credited  with  pioneering  the  research  in  this  field.  They 
reported  a series  of  studies  with  college  students  which 
showed  that  students  with  high  test  anxiety  performed  less 
well  in  evaluative  situations  than  those  with  low  test 
anxiety.  They  attributed  this  to  the  arousal  of  task- 

individuals  and  theorized  that  test-anxious  people  react  to 
evaluative  stress  by  emitting  negative  self-centered 
responses.  These  anxious  and  task-irrelevant  responses 

who  are  highly  test-anxious  do  less  well  than  those  who  are 


Sarason,  Davidson,  Lighthall,  Waite,  and  Ruebush 


(1960) , who  based  their  observations  on  studies  that 
reported  relationships  between  scores  on  anxiety  scales  and 


other  per 


children 


those  who  have  self-depreciatir 


attitudes,  anticipate  that  they  cannot  meet  the  standards 
of  performance  of  others  or  of  themselves,  and  experience 
unpleasant  feelings  of  uneasiness  and  tension.  They 
suggested  that  the  test-anxious  response  has  two  major  and 
cumulative  effects:  it  narrows  the  perception  of  the  ex- 

ternal field  and  prevents  a dispassionate  assessment  of  the 
nature  of  the  problem-solving  task  and  so  interferes  with 
problem-solving  in  a testing  situation. 

High  test-anxious  individuals  are,  in  effect,  dividing 
their  attention  between  the  task  at  hand  and  their  own 
worry  about  failure  and  disapproval  (Nottelman,  1975;  Wine, 
1971).  wine  (1974),  from  an  extensive  observational  study 
of  sixty-six  fourth  graders,  concluded  that  high  test- 
anxious  students'  self-preoccupations  are  shown  in  the 
classroom  by  an  intense  orientation  to  the  teacher,  to 
evaluative  feedback  by  the  teacher,  and  to  directions  from 
the  teacher  about  what  they  are  supposed  to  do.  At  the 
same  time,  these  children's  attention  was  distracted  from 
the  interactions  of  other  children  and  from  other  student- 
teacher  interactions,  resulting  in  social  isolation  and  the 
tuning  out  of  an  important  source  of  learning  and  informa- 
tion in  the  classroom. 

Other  writers  have  also  viewed  test  anxiety  as  a ten- 
dency to  exhibit  self-centered,  interfering  responses  when 
faced  with  an  evaluative  situation  (e.g.,  Liebert  & Morris, 


1967;  Marlett  & Watson,  1968;  Sarason,  1975).  In  summary, 
it  can  be  said  that  the  self-centered,  self-focusing  ten- 
dencies of  the  high  test-anxious  student  narrow  the  range 
of  the  attentional  field,  eliminating  cues  from  their  peers 
and  dividing  the  remaining  attention  between  self-and-task- 
relevant  cues. 

Components  of  test  anxiety.  Mandler  and  Sarason  (1952) 
investigated  the  role  of  drive  states  in  a testing  situa- 
tion; that  is,  the  extent  to  which  anxiety  is  evoked  in  an 
evaluative  situation  and  its  relation  to  performance.  Their 
assumptions  were  that  two  conflicting  drives  seem  to  be 
present  during  testing.  Learned  task  drives,  which  are  a 
function  of  the  nature  of  the  task,  test  materials,  and 
instructions,  elicit  responses  which  lead  to  completion  of 
the  task  at  hand.  Learned  anxiety  drives,  on  the  other 
hand,  are  a function  of  anxiety  reactions  previously  learned 
as  responses  to  stimuli  similar  to  those  found  in  the  test- 
ing situation.  Significant  to  the  research  on  test  anxiety 
was  their  identification  of  two  types  of  anxiety  responses: 
those  not  connected  with  the  nature  of  the  task  or  materi- 
als, which  cause  feelings  of  inadequacy,  helplessness,  and 
loss  of  esteem,  hindering  completion  of  the  task,  and  those 
directly  related  to  the  completion  of  the  task,  which,  in 
fact,  lead  to  completion  of  the  task.  In  their  studies 
with  college  students,  they  did  not  differentiate  between 
the  two  types  of  anxiety,  but  other  researchers  picked  up 


3- faceted  concept 


anxiety  and  farther 


developed  it.  Spielberger  (1966)  defined  two  types  of 
anxiety:  state  anxiety,  a transitory  state  that  varies 

over  time;  and  trait  anxiety,  a characteristic  tendency 
toward  anxiety,  that  remains  relatively  stable  over  time. 

He  labeled  the  two  components  A-State  and  A-Trait  to  dif- 
ferentiate between  anxiety  as  a temporary  condition  and 
anxiety  as  a personality  trait.  In  applying  Spielberger ' s 
concept  of  anxiety  to  research  on  test  anxiety,  it  has  been 
found  that,  in  general,  A-Trait  has  no  direct  effect  on 
performance,  although  A-Trait  and  A-State  sometimes  inter- 
act (King,  Heinrich,  Stephenson,  & Spielberger,  1976; 

O'Neil,  Spielberger,  & Hansen,  1969) , with  trait  anxiety 
influencing  state  anxiety,  which  in  turn  influences  achieve- 

Liebert  and  Morris  (1967),  in  examining  the  results  of 
factor-analytic  techniques  on  Mandler  and  Sarason's  (1952) 
Test  Anxiety  Questionnaire,  noted  that  two  classes  of 
factors  seemed  to  emerge:  cognitive  factors,  which  they 

labeled  "worry"  or  "lack  of  confidence"  (W)  and  factors 
which  referred  to  indices  of  autonomic  arousal,  which  they 
termed  "emotionality"  (E) . Working  with  undergraduate 
college  students,  they  found  a high  negative  correlation 
between  worry  and  expectancy  of  success  or  lack  of  success 
on  a major  course  examination.  No  relationship  between 
expectancy  and  emotionality  was  found.  Later,  Spiegler, 


Morris,  and  Liebert  (1968)  conducted  two  studies  designed  t 
replicate  previous  findings  and  to  test  additional  hypoth- 


eses concerning  the  worry-emotionality  distinction,  one 
with  undergraduate  students  and  one  with  graduate  students . 
These  studies  confirmed  the  hypothesis  that  performance 
expectancy  is  related  to  W scores.  They  also  found  that  E 
scores  increased  sharply  from  five  days  before  to  immediately 
before  the  testing  situation  and  decreased  immediately  after, 
whereas  W scores  remained  stable.  In  later  studies  by  Morris 
and  Liebert  (1969)  and  by  Doctor  and  Altman  (1969),  it  was 
found  that  W scores  were  negatively  related  to  performance  on 
intellectual  and  cognitive  tasks,  whereas  E scores  were 
related  to  performance  only  among  students  with  a low  W score. 

Morris,  Finkelstein,  and  Fisher  (1976)  made  the  first 
attempt  to  apply  the  worry-emotionality  distinction  to 
children's  school  anxiety.  They  hypothesized  that  worry 
would  increase  over  time,  as  students  became  more  aware  of 
the  importance  of  their  academic  performance,  and  that  the 
concern  would  increase  earlier  and  be  stronger  in  girls 
than  in  boys.  It  has  been  found  that,  in  general,  girls 
report  more  school  anxiety  than  boys  (Manley  & Rosemier, 

1972;  Sarason  et  al.,  1960) .[  They  did  two  studies  to  test 
their  hypotheses.  In  the  first,  in  order  to  test  the 
hypothesis  that  girls'  anxiety  would  increase  faster  than 
boys'  with  age  (grades  3-8),  they  administered  Dunn's  School 
Anxiety  Questionnaire.  Five  types  of  school  anxiety 


identified  in  this  questionnaire  are  recitation  anxiety, 
test  anxiety,  report  card  anxiety,  achievement  anxiety,  and 
failure  anxiety.  Emotionality  and  worry  items  are  easily 
identified  in  this  scale.  Mo  significant  sex-grade  level 
interactions  were  found.  Significant  sex  differences 
appeared  in  the  test  anxiety  items,  with  girls  reporting  a 
higher  level  of  anxiety  than  boys.  In  their  second  study 
they  administered  the  School  Anxiety  Questionnaire  to 
eighth  grade  students.  A month  later  a ten-item  worry- 
emotionality  questionnaire  (Liebert  & Morris,  1967)  was 
administered.  They  found  high  correlations  between  the 
types  of  school  anxiety,  especially  test  anxiety,  and  worry 
scores.  They  further  found  that  worry  was  more  highly 
related  to  all  types  of  school  anxiety  than  was  emotion- 
ality, and  that  worry  was  also  more  highly  related  to 
grades  than  emotionality.  Girls  had  significantly  higher 
scores  on  three  of  the  five  types  of  school  anxiety,  with 
test  anxiety  being  highest,  as  well  as  on  both  worry  and 
emotionality. 

Antecedents  of  Test  Anxiety 

Parental  relationships.  In  attempting  to  answer  the 
question  of  why  some  individuals  are  overly  anxious  about 
school  in  general  and  about  evaluative  situations  in  partic- 
ular, it  is  necessary  to  explore  possible  antecedents  of 
the  phenomenon  of  test  anxiety.  Sarason  et  al.  (1960) 
the  test-anxious  response  cannot  be 


indicated  that 


understood  without  considering  parental  behavior.  The  most 
important  situation  outside  of  school  in  which  the  child 
experiences  evaluation  from  significant  adults  is  the 
familial  one.  The  authors  (Sarason  et  al.,  1960)  hypothe- 
sized that  the  reaction  of  the  child  to  test-like  situa- 
tions within  the  family  tend  to  generalize  to  tests  and 
test-like  situations  in  the  classroom.  This  is  most  likely 
to  occur  when  parents'  repeated  negative  evaluations  of  the 
child  produce  hostility  which  cannot  be  expressed.  If  it 
is  expressed  verbally,  it  is  punished;  if  it  is  expressed 
in  fantasy,  it  produces  guilt,  which  is  difficult  for  the 
young  child  to  handle.  Eventually  the  child  will  begin  to 
doubt  his  or  her  own  self-worth,  and  will  become  dependent 
upon  parents  and  other  authority  figures  for  approval, 
direction,  and  support.  This  dependency  behavior  may  be 
rewarded  and  encouraged  by  the  parents,  resulting  in  a 
breakdown  of  communication  between  parents  and  child. 

In  a study  by  Perry  and  Millimet  (1977) , addressing 
the  question  of  child-rearing  antecedents  of  anxiety  in 
children,  sixteen  low-anxiety  and  sixteen  high-anxiety 
eighth  grade  students  and  their  parents  completed  several 
questionnaires  concerning  family.  In  a test-like  situation, 
each  child  stacked  blocks  with  the  unpreferred  hand  while 
blindfolded.  Both  parents  were  present,  and  their  role  was 
to  give  verbal  assistance  to  the  child.  These  verbal  inter- 
actions were  recorded.  Taking  all  findings  into 


consideration,  two  distinct  pictures  oE  families  of  high- 
and  low- anxious  children  emerged.  It  was  found  that 
families  of  the  low-anxious  children  were  characterized  by 
consistency  and  harmony  between  parents  regarding  rearing 
of  the  children.  The  children  were  permitted  freedom  and 
independence  within  clearly  set  limits,  and  were  punished 
when  they  overstepped  the  bounds.  The  children  viewed  the 
punishment  as  fair  when  it  occurred.  Parents  and  children 
frequently  worked  together  on  certain  designated  tasks. 
High-anxious  children  were  more  likely  to  come  from  a 
broken  home,  but  when  the  families  remained  together,  there 
was  a great  deal  of  inconsistency,  disagreement,  criticism, 
and  lack  of  definition  of  family  rules.  Both  parents 
tended  to  worry  about  other  peoples'  opinions  of  the  family 
and  made  the  children  aware  that  their  behavior  reflected 
on  the  family  reputation.  Children  of  these  families  were 
apt  to  be  placed  in  frequent  double-bind  situations. 

Correlates  of  Test  Anxiety 

Test  anxiety  and  need  achievement.  Sarason  et  al. 
(1960)  viewed  test  anxiety  as  an  anxiety  about  achievement 
in  certain  situations.  More  specifically,  the  test  situa- 
tion is  one  in  which  the  child  knows  that  he  or  she  will  be 
evaluated  on  what  has  been  achieved  in  the  past  or  will  be 
achieved  in  the  future.  Strong  achievement  motivation  may 
be  prompted  by  the  need  of  the  test-anxious  child  to  reach 
the  standards  imposed  by  parents  or  another  authority 


figure,  such  as  the  teacher.  Although  test-anxious  chil- 
dren may  set  goals  for  themselves,  what  really  matters  is 
their  need  for  approval  from  significant  adults.  Sarason 
and  Handler  (1952)  hypothesized  that  the  more  individuals 
feel  that  they  need  to  achieve  in  intellectual  tasks,  the 
more  likely  it  is  that  a challenging  situation  will  arouse 
fear  of  failing  and  of  receiving  resulting  punishments. 

They  further  predicted  that  high-anxious  subjects  would 
tend  to  come  from  families  where  intellectual  achievement 
is  considered  to  be  important.  Using  data  from  the  files 
of  the  student  Appointment  Bureau  of  Yale  University,  they 
collected  information  on  students  from  five  undergraduate 
classes  of  sophomores  and  juniors  concerning  fathers' 
occupation,  scholarship  grants,  fathers'  education,  and 
students'  previous  schooling.  They  found  a strong  correla- 
tion between  anxiety  as  measured  by  the  Test  Anxiety 
Questionnaire,  and  need  for  intellectual  achievement,  as 
interpreted  by  the  collected  data.  Their  findings  seem  to 
suggest  that  expectations  of  parents  continue  to  exert  con- 
siderable influence,  even  into  young  adulthood. 

Test  anxiety  and  social  class . It  could  be  expected 
that  the  need  for  achievement  might  be  closely  related  to 
social  class.  In  general,  parents  of  lower  socioeconomic 
status  do  not  tend  to  place  as  much  emphasis  on  intellec- 
tual achievement  as  those  from  a higher  level.  Where  parents 
do  not  value  academic  achievement,  children  are  under  no 


47 


particular  pressure  to  perform  well,  and  this  attitude  is 
often  reinforced  by  teachers,  who  do  not  expect  much  from 
children  of  this  background  (Sarason  et  al. , 1960) . How- 
ever, Sarason  and  Mandler  (1952),  in  their  study  with 
college  students,  found  only  a slight  relationship  between 
test  anxiety  and  social  class  factors,  although  there  was  a 
strong  relationship  between  test  anxiety  and  need  for 
achievement.  Sarason  et  al.  (1960)  explained  that  even  in 
a family  where  there  is  little  concern  about  the  level  and 
rate  of  intellectual  development  of  the  child,  children  are 
nevertheless  exposed  to  numerous  test-like  situations  in 
the  family  interactions,  where  they  are  evaluated  and  found 
lacking.  When  these  children  enter  the  school  culture 
they  become  test-anxious,  even  though  their  parents  are  not 
concerned  with  intellectual  adequacy.  Also,  within  the 

ment.  These  authors  acknowledged  a small  positive  relation- 
ship between  test  anxiety  and  social  class  indices,  and 
concluded  that  among  children  from  lower  class  homes,  where 
intellectual  values  are  not  stressed,  there  would  neverthe- 
less be  a large  number  of  test-anxious  children. 

Test  anxiety  and  intelligence.  The  relationship 
between  intelligence  and  test  anxiety  is  not  clear.  In  the 
majority  of  investigations,  a low  negative  correlation 
between  anxiety  and  intelligence  has  been  found  (Phillips, 


1972) 


<iety 


Martin,  & Meyers, 
interferes  with  performance  on  an  intellectual  task  (e.g., 
Alpert  & Haber,  I960;  Feldhausen  & Klausmier,  1965)  but 
does  not  necessarily  mean  that  high-anxious  individuals  are 
less  intelligent. 

intelligence,  Sarason  et  al.  (1960)  took  the  position  that 
anxiety  is  the  crucial  factor.  They  argued  that  the  rela- 
tionship between  anxiety  and  intelligence  depends  upon  the 

when  the  test  is  administered  in  a highly  test-like  atmos- 

Highly  test-anxious  children  perform  as  well  as  or  better 
than  the  low-anxious  when  evaluative  stress  is  low. 

Feldhausen  and  Klausmier  (1965)  examined  the  relation- 

the  Children's  Manifest  Anxiety  Scale  with  children  having 
low,  average,  and  high  IQs.  They  found  that  anxiety 
scores  correlated  with  IQ  and  achievement  in  a negative 
direction  for  the  middle  and  low  IQ  groups.  Correlations 
for  the  high  IQ  group  approached  zero.  Spielberger  (1966) , 
working  with  college  students,  found  that  academic  per- 
formance of  high  ability  students  was  also  affected  by  high 
anxiety  levels.  Stevenson  and  Odom  (1965)  found  signifi- 
cant negative  correlations  between  level  of  anxiety  and  IQ 
on  the  verbal  portion  of  the  California  Test  of  Mental 


Maturity  for  fourth  and  sixth  grade  boys  and  girls.  For 
the  non-verbal  portion,  the  correlation  was  significant 
only  for  sixth  grade  boys. 

Phillips  et  al.  (1972,  pp.  432-433)  suggested  that  the 
relationship  between  anxiety  and  intelligence  is  especially 
important  because  of  its  causal  implications.  Some  ques- 
tions raised  by  them  included:  “Does  the  negative  rela- 

tionship between  anxiety  and  intelligence  test  scores 
indicate  that  those  who  are  intelligent  are  more  capable 
of  coping  with  their  environment  and  are,  therefore,  less 
anxious?  Does  it  indicate  that  anxious  persons  have  a 
greater  difficulty  attending  to  and  retaining  information? 
Does  anxiety  lower  performance  on  the  test  that  would  have 
been  higher  if  the  anxiety  had  not  been  present?" 

Approaches  to  Minimizing  Effects  of  Test  Anxiety 

A variety  of  approaches  have  been  utilized  in  attempts 
to  reduce  test  anxiety  and,  at  the  same  time,  to  increase 
academic  and/or  test  performance.  Results  have  been  varied 
and  conflicting.  Some  of  the  studies  that  have  been  con- 
ducted and  their  results  will  be  reviewed  in  this  section. 

Systematic  desensitization.  Most  researchers  have 
thought  of  test  anxiety  as  a condition  of  excessive  arousal 
involving  anxiety-based  behaviors  (Kirkland  & Hollandsworth, 
1980) . Therefore,  systematic  desensitization,  a technique 
designed  to  inhibit  or  extinguish  anxiety-evoking  imagery 
and  excessive  physiological  symptoms  (e.g.,  trembling. 


They  used 


measured  by  examining  overall  grade 


Academic  gains  were 
point  averages  for  the  terms  prior  to  and  during  the  treat- 
ment. Between-group  comparisons  showed  no  significant 
differences  between  experimental  conditions  on  any  of  the 
dependent  measures.  Examination  of  within-group  differ- 
ences showed  that  systematic  desensitization  produced 
changes  on  test  and  trait  anxiety.  The  other  treatment 
groups  showed  significant  changes  on  test  anxiety  only, 
while  the  no-treatraent  group  demonstrated  no  improvement  on 
any  of  the  subjective  measures.  None  of  the  groups  exhib- 
ited significant  increases  in  their  grade  point  averages. 

Study  skills.  Some  researchers  have  questioned  whether 
high  levels  of  physiological  arousal  alone  are  related  to 
poor  test  performance.  It  has  even  been  suggested  that 
such  arousal  can  be  helpful  (e.g. , Def fenbacher,  1978; 
Kirkland  & Hollandswor th , 1980;  Osterhouse,  1972).  Some 
investigators  have  proposed  that  high  test-anxious  students 
have  poorer  study  skills  and  poorer  ability  than  the  low 
test-anxious.  Benjamin,  McKeachie,  Lin,  S Holinger  (1981) 
pointed  out  that  according  to  this  line  of  thought  high 
test-anxious  students  have  reason  to  be  anxious.  Test 
anxiety  and  poor  performance  have  a reciprocal  effect. 
Kirkland  and  Hollandsworth  (1979)  suggested  that  rather 
than  being  an  anxiety-related  disorder,  poor  test  perform- 
ance can  be  viewed  as  a skills  deficit,  and  thus  emphasis 
should  be  placed  on  remediating  this  deficit  rather  than  on 


determined  that  this  was  due 
emotionality  control 


than  high-i 


comparisons  within  this  study,  desensitization  subjects 
reported  less  anxiety  and  received  higher  scores  than  the 
study  skills  subjects. 

In  a study  by  Kirkland  and  Hollandsworth  (1980)  a 
skills-acquisition  treatment  for  test  anxiety  was  compared 
with  two  anxiety-reduction  treatments-- cue-con trolled 
relaxation  and  meditation — and  a practice-only  group.  Per- 
formance measures  were  cumulative  grade  point  averages,  the 

test.  Obtained  results  showed  that  the  skills-acquisition 

test  as  well  as  grade  point  averages.  They  also  reported 

tional  interference  during  testing.  The  researchers 
suggested  that  the  term  test  anxiety  be  replaced  by  the 

Benjamin  et  al.  (1981)  used  an  information  processing 
model  for  an  analysis  of  test  anxiety.  According  to  this 
model,  information  is  processed  in  three  stages — encoded, 
stored  and  organized — and  then  retrieved  when  necessary. 
They  hypothesized  that  the  deficient  performance  of  high 
test-anxious  subjects  might  be  due  to  problems  in  one  or 
more  of  these  stages.  In  this  study  undergraduate  college 


the  Liebert-Morris  (1967)  scale.  They 


mate  on  a five-point  scale  the  degree  to  which  they  had 
difficulties  in  learning  the  material  for  the  course,  in 
reviewing  materials  for  the  examination,  and  in  remembering 
the  material  during  the  examination.  They  were  also  asked 
to  estimate  the  number  of  hours  they  spent  studying  for  the 
course  and  to  indicate  their  grade  point  averages.  The 
final  examination  included  multiple  choice  and  essay  ques- 
tions. Results  confirmed  that  high-anxious  students  had 
poorer  grades  in  that  course  and  poorer  grade  point  aver- 
ages. High-anxious  students  also  did  more  poorly  on  each 
type  of  examination  question  than  did  the  others,  although 
they  did  relatively  better  on  multiple  choice  than  on  short- 
answer  questions.  This  confirmed  the  hypothesis  that  high 
test-anxious  students  had  difficulty  with  retrieval  of 
information  for  the  examination.  In  multiple  choice  items 
the  student  has  only  to  recognize  the  correct  answer. 
High-anxious  students  reported  more  problems  in  each  phase 
of  the  course,  with  the  larger  differences  in  reported 
problems  in  learning  materials  throughout  the  course, 
implying  difficulties  in  encoding  and  organizing  informa- 
tion. The  authors  stressed  the  importance  of  study-skills 
training  programs  in  reducing  test  anxiety  stemming  from 
inability  to  process  information  properly. 

Cognitive  modification.  The  major  causes  of  perform- 
ance decrement  for  the  high  test-anxious  student  are 


believed  to  be  failure  of  the  person  to  attend  to  relevant 
parts  of  the  task,  intrusion  of  irrelevant  thoughts,  and 
high  autonomic  arousal  (Wine,  1971).  In  situations  where 
their  performance  is  being  evaluated,  high  test-anxious 
persons  spend  a large  part  of  their  time  worrying  about 
their  performance  and  how  well  others  are  doing,  ruminating 
over  alternatives  and  being  preoccupied  with  such  things 
as  feelings  of  inadequacy,  anticipation  of  punishment,  and 
loss  of  self-esteem  (Mandler  s Watson,  1966 ; Marlett  6 
Watson,  1968) . Liebert  and  Morris  (1967)  identified  the 
worry  or  cognitive  component  of  test  anxiety,  rather  than 
emotionality,  as  more  likely  to  interfere  with  test  perform- 
ance. Meichenbaum  (1972)  suggested  a cognitive  modifica- 
tion treatment  procedure  to  deal  with  both  worry  and 
emotionality  components  of  test  anxiety.  The  first  part  of 
the  treatment  consisted  of  making  the  test-anxious  subjects 
aware  of  their  self-defeating  thoughts,  verbalizations,  and 
behavior  in  test  situations,  and  teaching  them  ways  by 
which  they  can  inhibit  these  thoughts.  The  second  part  of 
the  procedure  was  to  use  "coping  imagery"  as  a modification 
of  systematic  desensitization.  This  involved  teaching  sub- 
jects to  visualize  themselves  coping  with  their  anxiety  by 
means  of  slow,  deep  breaths  and  self-instructions  to  focus 
on  the  task.  Thus  the  cognitive  modification  treatment  was 
designed  to  deal  with  both  components  of  test  anxiety — worry 
and  emotionality. 


In  this 


idy  undergraduate  subjec 


list  control  group,  a standard  desensitization  group,  or  a 
cognitive  modification  group.  Pre-  and  post-treatment 

assess  change.  In  addition,  subjects  were  assessed  before 
and  after  treatment  in  a laboratory  test-taking  situation, 

Comparisons  among  the  groups  on  the  performance  measures 
indicated  that  the  cognitive  modification  group  showed  the 
greatest  improvement,  although  not  significantly  different 
from  the  desensitization  group.  The  control  group  showed 
significantly  less  improvement  than  the  two  treatment 
groups  on  grade  point  average  and  the  digit  symbol  test, 
but  comparable  improvement  on  the  Raven's  Matrices  Test. 

The  cognitive  modification  group  showed  the  most  signifi- 
cant improvement  in  grade  point  average,  also,  subjects  in 
the  cognitive  modification  group  demonstrated  the  most 

the  cognitive  measure  and  on  the  self-report  measure  of 
anxiety.  The  study  illustrates  that  high  worry  behavior  is 

Although  Meichenbaum's  (1972)  research  suggested  that 
cognitive  modification  might  be  a useful  approach  to  deal- 
ing with  test  anxiety,  it  did  not  permit  the  identification 


5yd  11976) 


iducted 


group,  a desensitization-only  group,  and  a no-treatment 
control  group.  Study-skills  training  was  given  to  each 
group.  Subjects  were  self-referred  university  students. 
Outcome  measures  were  the  Liebert-Morris  Test  Anxiety  Scale, 
self-ratings  of  Emotionality  and  Worry,  and  a digit  symbol 
performance  task.  The  major  finding  of  the  study  was  that 
the  cognitive  component  of  the  cognitive-behavior  modifica- 
tion treatment  for  test  anxiety  was  more  effective  than 
either  the  desensitization  component  or  a combination  of 
the  two.  It  appears  from  this  study  that  using  half  of 
treatment  time  for  desensitization  was  not  productive.  The 
results  of  the  study,  as  well  as  Holroyd's  study,  support 
the  theory  of  Liebert  and  Morris  that  it  is  the  cognitive, 
or  worry,  component  of  test  anxiety  that  produces  perform- 
ance decrements. 

Finger  and  Galassi  (1977)  examined  the  differential 
effects  of  treatment  for  the  emotionality  and  worry  com- 
ponents of  test  anxiety  using  a single  method  of  response 
modification.  The  subjects  for  their  study  were  under- 
graduates at  the  University  of  North  Carolina  at  Chapel 
Hill,  who  had  scored  high  on  debilitating  anxiety  as 
measured  by  the  Achievement  Anxiety  Test  of  Alpert  and 
Haber  (1960).  Students  were  randomly  assigned  to  one  of 
four  groups:  an  attentional  treatment,  in  which  attention 

to  task-relevant  responses  was  reinforced;  a relaxation 
treatment,  where  relaxation  responses  were  reinforced;  a 


list 


59 


stimulus- response-reinforcement.  Stimulus  scenes  were 
standard  across  groups,  but  the  response  scenes  were 

each  group.  The  Liebert-Morris  Emotionality  and  Worry 
Scales  were  used  to  measure  state  anxiety.  Performance 

Symbols  Test.  Results  showed  that  effects  on  Worry  and 
Emotionality  were  the  same,  regardless  of  the  type  of  treat- 
ment. Also,  on  the  Facilitating  Scale  of  the  Achievement 
Anxiety  Test  significant  differences  failed  to  be  obtained 
between  the  groups  receiving  cognitive  treatment  and  those 
who  did  not.  Neither  were  significant  performance  changes 
seen  for  any  group.  This  study  suggested  that  although  the 

identified  independently,  they  may  interact  as  a single 
process  in  a testing  situation. 

Leal,  Baxter,  Martin,  and  Marx  (1981)  extended  the 
exploration  of  the  effectiveness  of  cognitive  modification 
versus  systematic  desensitization  for  the  alleviation  of 

students.  Tenth  grade  test-anxious  students  were  randomly 

desensitization,  or  waiting-list  co 


Thre 


selected  for  the  study  were  primarily  test-anxious  rather 
than  generally  anxious,  and  that  their  test  anxiety  was  not 
due  to  study  problems.  Before  and  after  the  treatment 
phase  of  the  program,  subjects  performed  on  the  Raven's 

dition  and  completed  the  State-Trait  Anxiety  Inventory- 

that  cognitive  modification  was  superior  over  systematic 

with  the  systematic  desensitization  group  showing  a greater 

the  cognitive  modification  group,  although  only  the  gain 
for  the  systematic  desensitization  group  was  statistically 
significant.  The  researchers  cautioned  that  no  strong  con- 
clusions could  be  drawn.  For  the  most  part,  findings  of 
this  study  tended  to  generalize  from  those  of  the  univer- 
sity studies. 

Wine  (1974)  formulated  a treatment  program  based  on  a 

of  this  program,  she  worked  with  test-anxious  third  and 
fourth  graders,  who  were  placed  in  either  a task-attending 
training  group,  a placebo  treatment  group,  or  a no-treatment 
control  group.  The  children  were  pre-  and  post-tested  with 


61 

a general  ability  test,  the  IPAT  Test  of  G,  a brief  reading 
test,  the  Gates-McGintie  Test  of  Speed  and  Accuracy,  and 
the  Test  Anxiety  Scale  for  Children.  The  training  dealt 
with  both  the  emotionality  and  worry  components  of  test 
anxiety.  Procedures  involved  training  in  self-instructions 
useful  in  approaching  tasks,  training  in  self-structuring 
of  tasks  and  in  progression  through  them  in  a systematic 
fashion,  and  self-instruction  in  relaxation,  which  included 
deep  breathing  exercises.  The  results  of  the  study  showed 
a significant  reduction  in  the  test  anxiety  of  children 
given  task-attending  training.  IQ  scores  were  signifi- 
cantly improved  in  the  task-attending  group,  but  no 
significant  improvement  in  reading  performance  was  shown  by 
any  group.  Test  anxiety  levels  of  children  in  the  placebo 
group  were  also  significantly  reduced,  but  their  cognitive 
performance  showed  no  significant  improvement. 

Test-wiseness  training.  Many  of  the  procedures 
described  in  the  preceding  studies  may  be  too  complicated, 
too  time-consuming,  or  too  far  removed  from  the  instruc- 
tional focus  of  most  classroom  teachers  or  school  counsel- 
ors. Teaching  test-taking  strategies  has  been  suggested  as 
a means  of  reducing  test  anxiety  and  improving  test  scores 
(Lange,  1978) . Kirkland  and  Hollandsworth  (1979)  found 
significant  correlations  between  impaired  academic  perform- 
ance and  both  high  levels  of  debilitative  anxiety  and  low 
levels  of  facilitative  anxiety,  as  well  as  with  deficits  in 


test-taking  skills.  They  raised  the  question  of  whether 
test  anxiety  interferes  with  test-taking  behavior  or 
whether  the  lack  of  test-taking  skills  causes  test  anxiety. 
More  studies  are  needed  to  help  establish  the  relationship 
between  test-wiseness  and  test  anxiety. 


Definition  and  Analysis 
The  definition  of 
in  the  literature  is  "a  subject' 
characteristics  and  formats  of  the 


most  often  used 
capacity  to  utilize  the 


taking  situation  to  receive  £ 
s Ebel,  1965,  p.  707).  Test- 
dent  of  the 
which  the  items 
Millman  et 


(Millman,  Bishop, 
logically  indepen- 
knowledge  of  the  subject  matter  for 
supposedly  measures. 

{1965}  provided  a list  of  principles 
which  students  should  apply  in  taking  tests.  The  list  was 
synthesized  from  test  construction  principles  and  problem- 
solving styles  of  test-takers  and  was  intended  as  a theo- 
retical framework  for  future  research  in  the  area  of  test- 
They  divided  the  principles  of  test-wiseness 
o major  categories: 

Elements  independent  of  t 
purpose . 

A.  Time-using  strategy 

B . Error-avoidance  strategy 


C.  Guessing  strategy 


D.  Deductive  reasoning  strategy 
XI.  Elements  dependent  upon  the  test  constructor  or 
purpose . 

A.  Intent  consideration  strategy 

B.  Cue-using  strategy 

An  explanation  of  the  Millman  et  al.  framework  is  in 
order,  since  much  of  the  literature  on  test-wiseness  refers 
to  their  analysis.  The  principles  falling  into  the  first 
category  are  potentially  valid  in  all  testing  situations 
regardless  of  previous  contact  with  the  test  constructor  or 

A.  Time-using  strategy  applies  only  to  tests  having 
time  limits  and  are  concerned  with  the  most  efficient  use 
of  the  allotted  time. 

B.  Error-avoidance  strategy  applies  to  all  testing 
situations  concerned  with  the  avoidance  of  careless  mis- 

C.  Guessing  strategy  may  make  it  possible  for  the 
examinee  to  receive  credit  for  answers  made  on  a purely 
chance  basis. 

D.  Deductive  reasoning  strategy  provides  methods  of 
obtaining  correct  answers  indirectly  or  with  only  part  of 
the  information  needed  to  answer  the  question. 

The  second  main  category,  elements  dependent  upon  the 
test  constructor  or  test  purpose,  includes  strategies  which 


should 


only  when 


st rue tor ' s views  or  the  purpose  of  the  test#  or  has  had 
contact  with  similar  tests. 

A.  Intent  consideration  strategy  is  concerned  with 
strategies  that  allow  the  examinee  to  avoid  losing  credit 
for  anything  other  than  lack  of  knowledge  of  subject  matter. 
It  involves  attention  to  the  intent  of  the  test  constructor 
in  including  specific  questions  or  items. 

B.  Cue-using  strategy  attends  to  the  use  of  cues 
which  are  evident  when  a specific  answer  is  not  known. 

These  may  include  differences  in  length  of  options  in  a 
multiple  choice  test,  grammatical  inconsistencies  between 
options  and  stem,  absurd  options,  and  similar  language  in 
options  and  stem.  However,  the  authors  warn  that  this 
strategy  should  be  used  only  when  the  examinee  is  not  able 
to  use  knowledge  of  the  subject  matter  and  reasoning 

Historical  Perspective 

As  stated  by  Sarnacki  (1978,  p.  252)  in  his  compre- 
hensive review  of  the  research  on  test-wiseness,  "the 
construct  of  test-wiseness  has  a relatively  short  history 
in  educational  research."  Its  possible  effect  on  test 
reliability  was  first  mentioned  by  Thorndike  (1951) , who 
classified  it  as  a possible  source  of  variance  in  test 
scores  and  described  it  as  a persistent  general  trait  of 


Ebel  and  Damrin  (1960)  later  listed 


wiseness  as  a component  of  research  variance  in  objective 
test  questions. 

The  first  significant  research  in  the  area  of  test- 
wiseness was  done  by  Gibb  (1964),  who  provided  an  opera- 
tional definition  of  test-wiseness  and  developed  an  instru- 
ment to  measure  it.  Mi liman.  Bishop,  and  Ebel  (1965)  then 
provided  a comprehensive  analysis  of  TW,  the  purpose  of 
which  was  to  provide  a framework  for  future  research. 

Since  that  time,  numerous  studies  have  been  reported, 
resulting  in  increasing  recognition  of  TW  as  a source  of 
error  variance  in  test  scores  and  a depressor  of  test 
reliability  and  validity  (Sarnacki,  1979),  as  well  as 
increasing  understanding  of  the  construct  of  TW. 

Evidence  for  the  Existence  of  Test-Wiseness 

TW  can  be  verbalized.  There  is  evidence  that  high 
school  students  can  describe  some  of  the  principles  of 
test-wiseness.  In  a survey  conducted  by  Millman  et  al.  (1965) 
240  high  achieving  students  in  a suburban  high  school  were 
instructed  to  write  suggestions  that  they  might  give  to  a 
new  student  who  was  having  difficulty  getting  good  scores 
on  some  of  the  teacher-made  tests.  The  responses  included 
such  items  as  planning  time,  answering  easier  questions 
first,  guessing  when  the  answer  is  not  known,  and  eliminat- 
ing possible  foils. 


Diamond  and  Evans  (1972) , in  a study  with  sixth  grade 
students,  found  that  these  students  also  were  able  to 
verbalize  the  principles  of  test-wiseness  as  evidenced  by  a 
discussion  with  a group  of  the  students  after  a testing 
session.  They  included  many  of  the  same  items  as  the  high 
school  students. 

TW  can  be  measured.  It  stands  to  reason  that  if 
test-wiseness  can  be  verbalized,  it  can  also  be  measured. 
Various  researchers  (e.g.,  Slakter,  Koehler,  & Hampton, 
1970a;  Diamond  a Evans,  1972;  Flynn  & Anderson,  1977) 
devised  instruments  to  measure  test-wiseness.  These  instru- 
ments consist  of  nonsense  items  or  items  based  on  ficti- 
tious materials,  having  no  correct  answer,  and  embedded  in 
legitimate  items.  One  option  on  each  nonsense  item  desig- 
nates as  a test-wise  response  because  of  cues  contained  in 
the  items.  Students  would  average  one  test-wise  response 
for  every  four  items  on  the  nonsense  portion  of  the  test  if 
they  merely  guessed.  However,  the  students  chose  the  test- 
wise  response  on  more  than  half  the  items,  which  is  more 
than  twice  the  expected  success  rate  from  guessing.  Hence, 
they  must  have  responded  to  cues  within  the  test  items. 

TW  can  be  taught.  There  is  some  controversy  among 
researchers  as  to  the  cognitive  aspect  of  test-wiseness. 
Diamond  and  Evans  (1972)  concluded  from  their  study  with 
sixth  grade  students  that  test-wiseness  is  cue  specific, 
and  Warshauer 


general  cognitive  ability.  Jongsma 


(1975,  p.  14)  decided  that  test-wiseness  is  not  a general 
trait,  but  a "network  of  specific  and  independent  skills." 
Others  (e.g.,  Ebel  & Damrin,  1960;  Woodley,  1972)  view  it 
as  a specific  cognitive  skill  which  can  be  developed 
through  experience  and/or  training  in  taking  tests. 

Whatever  their  position  might  be,  most  experts  agree 
that  principles  of  TW  can  be  taught  to  students  of  all  ages, 
using  a variety  of  techniques  (e.g.,  Jongsma  6 Warshauer, 
1975;  Jones  a Ligon,  1981).  Studies  have  been  done,  rang- 
ing from  pre-schoolers  (Oakland,  1972;  Dreisbach  & Keogh, 
1982)  to  elementary  school  students  (Jongsma  s Warshauer, 
1975;  Eakins,  Green,  & Bushnell,  1976)  to  post-secondary 
students  (Flynn  & Anderson,  1977;  Woodley,  1972).  Results 
have  not  been  statistically  significant  in  all  instances 
but,  as  stated  by  Jongsma  and  Warshauer  (1975,  p.  18),  “the 
gains  nearly  always  favor  the  instructional  group." 

Correlates  of  Test-Wiseness 

One  possible  research  question  suggested  by  Millman  et 
al.  (1965)  in  their  analysis  was,  "What  are  the  correlates 
of  test-wiseness?"  Knowledge  of  which  variables  TW  is 
related  to  could  provide  new  directions  for  research  as 
well  as  help  to  establish  validity  of  the  construct  of  TW 
(Sarnacki,  1979) . Several  possible  correlates  of  TW  have 
been  investigated. 

Grade  level  and  sex.  The  relationship  between  TW, 
grade  level,  and  sex  was  investigated  by  Slakter,  Koehler, 


and  Hampton  (1970a)  in  a cross-sectional  study  with  stu- 
dents in  grades  five  through  eleven.  They  found  that  while 
sex  was  not  related  to  TW  abilities,  grade  level  was.  As 
grade  level  increased,  students'  performance  on  TW  scales 
also  increased.  In  discussing  possible  reasons  for  the 
increase  over  grade  levels,  the  authors  suggested  that  this 
might  be  due  to  greater  experience  with  tests,  maturation, 
or  changing  population.  However,  later  longitudinal 
studies  were  undertaken  by  Crehan,  Koehler,  and  Slakter 
(1974)  after  a two-year  interval  and  by  Crehan,  Gross, 
Koehler,  and  Slakter  (1978)  after  a four-year  interval. 

The  results  of  these  studies  indicated  also  that  TW  appears 
to  be  somewhat  stable,  increasing  over  grades,  with  no 
evidence  of  sex  differences  or  sex  by  grade  interaction. 
Large  individual  differences  in  TW  persisted  into  the  high 
school  grades. 

Intelligence.  It  is  generally  agreed  that  TW  is  a 
cognitive  ability  of  test-takers  (e.g.,  Woodley,  1972). 
Hence,  it  would  be  expected  to  relate  to  intelligence 
(Sarnacki,  1979).  In  an  early  study,  Dunn  and  Goldstein 
(1959)  explored  the  relationship  between  TW  and  general 
mental  ability  by  examining  correlations  between  TW  scores 
on  blocks  of  items  written  in  varying  degrees  of  conform- 
ance to  item  writing  principles  and  scores  on  an  army 
aptitude  test.  They  obtained  correlations  of  zero  between 
intelligence  and  TW  abilities. 


Diamond  and  Evans  (1972)  investigated  cognitive 


lates  of  TW  with  a group  of  sixth  grade  students.  They 
found  moderate  positive  correlations  between  intelligence 
as  measured  by  the  Lor ge- Thorndike  Intelligence  Test,  and 
only  three  out  of  five  specific  TO  cues.  They  concluded 
that  TO  is  not  a pervasive  skill  but  is  specific  to  the 
particular  clue  or  cue  under  investigation  and  has  little 
relationship  to  a student's  cognitive  ability. 

The  findings  of  these  studies  are  consistent.  The 
ability  to  use  test-taking  strategies  does  not  seem  to  be 
related  to  general  intelligence  as  measured  by  group 
intelligence  tests.  These  findings  indicate  that  test- 
wiseness skills  can  be  taught  to  students  at  any  level  of 
intelligence . 

Verbal  ability.  Since  recognition  of  most  TO  clues  is 
dependent  upon  such  skills  as  knowledge  of  grammar,  vocabu- 
lary, and  sentence  structure,  it  seems  that  test-wise 
students  should  obtain  high  scores  on  tests  of  verbal 
ability  (Sarnacki,  1979) . The  relationship  between  TO  and 
verbal  skills  has  been  examined  in  several  studies.  For 
example,  in  the  Diamond  and  Evans  (1972)  study,  correla- 
tions between  verbal  scores  from  the  Iowa  Test  of  Basic 
Skills  and  total  TW  scores,  as  well  as  scores  on  two  sub- 
scales of  the  TW  measure,  support  the  idea  that  some  verbal 
ability  is  associated  with  test 


t-wiseness.  Rowley  (1974) , 


working  with  high  school  students,  found  a significant 


positive  relationship  between  TW  and  performance  on  a 
multiple-choice  vocabulary  test. 

Anxiety . Although  the  nature  of  the  relationship  be- 
tween TW  and  anxiety  has  not  been  established,  it  is 
generally  acknowledged  that  test-wiseness  and  anxiety  are 
not  compatible  (Sarnacki,  1979;  Woodley,  1978).  Sarnacki 
(1979)  further  pointed  out  that  in  order  to  notice  and 
profit  from  TW  cues,  examinees  must  show  some  degree  of 
composure.  Those  who  do  not  have  this  composure  may  become 
anxious  in  testing  situations  to  the  extent  that  they  fail 
to  notice  important  cues. 

Approaches  to  Minimizing  Effects  of  Test-wiseness 

It  is  generally  acknowledged  that  a lack  of  test- 
taking skills  among  students  is  a possible  source  of  error 
variance  in  standardized  test  scores.  Attempts  to  minimize 
the  effects  of  TW  include  more  careful  test  construction 
and/or  providing  students  with  better  strategies  for  taking 
tests.  Some  approaches  to  achieving  this  goal  will  be 
discussed. 

Test  construction.  Bergman  (1980)  suggested  that  TW 
is  important  for  two  groups  of  people:  test-makers  and 

test-takers.  One  way  of  reducing  error  variance  in  test 
scores  would  be  for  test-makers  to  construct  tests  that  are 
"test-wise  proof.”  This  would  necessitate  clear  directions 
and  the  elimination  of  items  which  contain  TW  cues. 


72 


fifty-seven  hours,  while  an  increase  of  thirty  points  would 
entail  two  hundred  sixty  hours  of  coaching. 

After  investigating  several  coaching  studies , the 
trustees  of  the  College  Entrance  Examination  Board  issued  a 
statement  to  the  effect  that  the  usual  score  gains  result- 
ing from  coaching  amount  to  fewer  than  ten  points  (Ford, 
1973) . Other  studies  reported  by  Anastasi  (1981)  indicated 
that  the  usual  short-term  high  school  coaching  programs 
yielded  average  gains  of  approximately  ten  points  in  SAT- 
Verbal  scores  and  approximately  fifteen  points  in  SAT- 
Mathematics  scores. 

Anastasi  (1981)  made  the  point  that  individuals  with 
deficits  in  their  education  are  more  likely  to  benefit  from 
coaching  than  those  who  have  had  adequate  educational 
opportunities  and  who  already  do  well  on  tests.  She 
further  pointed  out  that  the  "closer  the  resemblance 
between  test  content  and  coaching  material,  the  greater 
will  be  the  gain  in  test  scores,"  but  that  "the  more 
closely  the  instruction  is  restricted  to  test  content,  the 
less  likely  it  is  that  improvement  will  extend  to  criterion 
performance"  (Anastasi,  1981,  p.  1089). 

Similarly,  although  an  investigation  of  the  effects  of 
coaching  in  mathematics  scores  of  the  Graduate  Record 
Examination  have  met  with  some  success,  the  length  of  time 
and  the  intensity  of  the  program  were  considered  to  be 


excessive  (Ford,  1973). 


Teaching 


--taking  strategies. 


contrast 


coaching  studies,  TW  studies  have  focused  on  instruction  in 
test-taking  strategies  with  no  attempt  to  teach  subject 
matter  material.  Such  strategies  include  familiarity  with 
the  format  used  in  standardized  tests  (Oakland,  1972), 
familiarization  with  item  types  (McMillan,  1967;  McPhail, 
1978) , deductive  reasoning  skills  (Costar,  1980) , efficient 
use  of  time  (Jongsma  & Warshauer,  1975)  , and  guessing 
strategies  (Moore,  Schutz,  & Baker,  1966;  Slakter  et  al. , 
1970b) . A number  of  studies  focusing  on  acquisition  of  TW 
skills  will  be  reviewed  in  detail  in  the  following  section. 

Specific  programs  for  teaching  test-wiseness  strate- 
gies have  begun  to  appear  in  elementary  and  secondary 
schools  and  in  college  and  adult  programs.  Objective 
evidence  of  the  teachability  of  TW  has  been  obtained  for 
each  of  these  levels.  Examples  of  the  work  that  has  been 
done  are  reviewed  in  this  section. 

Pre-school  and  elementary  school.  Oakland  (1972) 
worked  with  children  from  Head  Start  classes  in  Austin, 
Texas,  using  curricular  materials  designed  to  increase  TW 
of  pre-school  children  who  were  unfamiliar  with  standard- 
ized tests.  The  children,  who  were  predominantly  black  and 
Mexican-American,  were  randomly  assigned  to  the  experi- 
mental and  control  group.  The  children  in  the  experimental 
group  worked  with  their  teachers,  using  the  TW  curriculum. 


during  30-minute  periods  twice  a week  for  six  weeks. 
Children  in  the  control  group  worked  with  teacher  aides  on 
special  activities.  All  children  took  the  Metropolitan 
Readiness  Test  prior  to  the  treatment  in  March,  immediately 
after  the  treatment  in  May,  and  at  the  beginning  of  the 
first  grade  in  September.  Significant  group  differences 
were  obtained  on  the  Total  Score  and  on  the  Matching  sub- 
test of  the  MRT,  as  well  as  marginal  gains  on  other  sub- 
tests. Group  differences  on  the  second  posttest,  adminis- 
tered four  months  later,  were  not  significant. 

Dreisbach  and  Keogh  (1982)  assessed  the  effects  of 
training  in  test-taking  skills  on  the  performance  of  young 
Mexican-American  children  on  a school  readiness  test. 
Experimental  groups  of  kindergarten  children  were  trained 
in  test-taking  skills,  while  control  groups  participated  in 
a supervised  and  organized  coloring  activity.  All  training 
was  done  in  Spanish.  One  week  after  training,  a readiness 
test  was  administered  once  in  Spanish  and  once  in  English 
to  experimental  and  control  groups.  It  was  found  that  the 
trained  group  performed  better  than  the  untrained  group 
whether  the  test  was  given  in  Spanish  or  in  English.  This 
study  is  significant  for  children  whose  low  test  scores 
might  reflect  a deficit  in  language  competence  rather  than 
in  the  abilities  being  measured. 

Bakins,  Green,  and  Bushnell  (1976)  investigated  the 
effects  of  practice  with  an  instructional  test-taking  unit 


students ' 


performance  gains  with  the  Metropolitan 


Achievement  Test.  A sample  of  170  first  grade  students 
received  multiple,  single,  or  no  presentations  of  the  test- 
taking unit.  At  the  end  of  the  unit  the  MAT  was  adminis- 
tered by  a testing  team  hired  by  the  school  district. 

Results  showed  consistently  that  students  given  multiple 
presentations  made  greater  gains  than  students  given  a 
single  presentation,  and  that  students  given  a single 
presentation  made  greater  gains  than  those  who  had  no  pre- 
sentation. The  researchers  concluded  that  practice  in  taking 
tests  may  be  the  most  effective  training,  and  if  it  is 
impossible  to  give  multiple  presentations,  then  a single 
presentation  is  better  than  none. 

Callenbach  (1973)  conducted  a study  with  second  grade 
students  in  a parochial  elementary  school  in  central 
Pennsylvania.  The  children  were  randomly  assigned  to 
experimental  or  control  groups,  with  the  experimental  group 
receiving  eight  30-minute  periods  of  instruction  and  prac- 
tice in  content- independent  standardized  test-taking 
techniques,  and  the  control  group  receiving  similar  lessons 
presented  without  instruction  in  test-taking  skills.  Imme- 
diate and  delayed  effects  were  assessed  by  administering 
the  Stanford  Reading  Test.  Results  showed  that  students 
who  received  instruction  in  test-taking  skills  achieved 
significantly  higher  scores  on  the  posttest  administered 


76 


the  week  after  the  instruction  as  well  as  on  the  delayed 
posttest  administered  four  months  later. 

Kalechstein  et  al.  (1981)  conducted  a study  to  investi- 
gate the  effect  of  instruction  in  test-taking  skills  in 
black  inner-city  children.  A sample  of  black  second  grade 
children  in  a Title  I public  elementary  school  in  Los 
Angeles  were  randomly  assigned  to  an  experimental  or  con- 
trol group.  Instructional  materials  based  on  the  format  of 
the  Stanford  Achievement  Reading  Test,  Primary  level,  were 
developed  and  presented  to  the  experimental  group  in  ten 
30-minute  sessions  over  a period  of  five  weeks.  The  con- 
trol group  received  similar  instruction  using  a different 
format.  At  the  end  of  the  instruction  the  SAT  Reading  Test 
was  administered  to  experimental  and  control  groups.  Group 
means  between  the  groups  were  statistically  significant, 
demonstrating  that  black  children  from  a Title  I school  can 
benefit  from  TW  instruction. 

A study  done  by  Taylor  and  White  (1982) , working  with 
second  grade  students,  investigated  the  effect  on  group 
standardized  achievement  tests  of  teachers  trained  in  test 
administration  techniques,  students  trained  in  TW  skills, 
and  students  reinforced  for  bettering  their  performance 
over  predictions  made  from  pretest  scores.  Teachers  of  the 
classroom  that  were  randomly  assigned  to  the  experimental 
group  received  eight  hours  of  training  in  appropriate  test 
administration  techniques.  Students  who  were  assigned  to 


77 

the  training  group  received  one  hour  of  training  in  test- 
taking skills  one  to  two  weeks  before  the  testing.  Students 
in  the  reinforcement  group  were  paid  one  nickel  for  each 
raw  score  point  above  an  established  base  score  predicted 
from  each  student's  score  on  the  SAT  administered  the  pre- 
vious fall.  Results  showed  that  students  who  were  rein- 
forced for  higher  performances  scored  significantly  higher 
than  those  who  were  not  reinforced.  Although  there  was  a 
raw  score  difference  between  test  scores  of  students  who 
took  the  test  from  trained  and  untrained  teachers,  this 
difference  was  not  statistically  significant.  Also,  no 
statistically  significant  difference  was  found  between 
trained  and  untrained  students  on  test  scores.  The  authors 
concluded  that  the  short  amount  of  time  given  to  the  student 
training  and  the  contents  of  this  particular  training  packet 
contributed  to  these  results. 

Costar  (1980)  described  a study  conducted  with  fourth 
grade  students,  using  a program  called  Scoring  High  in 
Reading  (Cohen  s Foreman,  1978) . In  the  experimental  group 
this  program  was  used  exclusively  as  the  reading  instruc- 
tional program.  In  addition  to  reading  skills,  test-taking 
skills  such  as  eliminating  inappropriate  answer  choices, 
identifying  key  words,  and  reasoning  from  facts  or  evidence 
were  taught.  The  control  group  followed  the  regular  read- 
ing program  with  no  instruction  in  test-taking  skills. 

After  the  instruction,  all  students  took  the  reading 


78 


subtests  of  the  Metropolitan  Achievement  Test.  After  an 
interval  of  approximately  two  months,  the  test  was  again 
administered  to  the  experimental  and  control  groups. 

Results  showed  that  there  was  no  statistically  significant 

involved  in  the  study  felt  that  the  program  was  valuable 
and  that  the  results  would  have  been  different  had  there 
not  been  the  pressure  to  complete  the  instruction  before 

As  part  of  a project  spearheaded  by  the  Department  of 
Elementary  and  Secondary  Education  at  New  Orleans 

studies  to  investigate  the  effects  of  a unit  in  test-taking 
strategies  on  reading  achievement  test  scores  of  fifth 
grade  students.  The  same  unit  was  presented  to  students  at 
two  schools  in  the  Metropolitan  New  Orleans  area,  one  a 
suburban  school  and  the  other  an  inner-city  school.  In  the 


two  groups — high  achieving  and  low  achieving — on  the  basis 
They  were  then  randomly  assigned  to  experimental  and  con- 
the  test-taking  unit,  which  required  about  an  hour's 

On  each  of  the 


three  subte 


79 

scores  of  the  experimental  group  were  greater  than  those  of 
the  control  group,  with  high  achievers  benefiting  more  from 
the  test-wiseness  instruction  than  low  achievers.  None  of 
the  differences  between  the  groups  reached  statistical 
significance.  However,  the  investigators  felt  that  the 
differences  were  educationally  significant,  as  the  students 
gained  a grade  equivalent  difference  of  several  months. 

In  the  second  part  of  their  study,  Jongsma  and 
Warshauer  (1975)  went  to  a school  in  a lower  middle  class 
neighborhood,  where  the  majority  of  the  students  were  black. 
Fifth  grade  students  in  two  classes  were  randomly  divided 
into  two  groups  and  one  group  was  randomly  assigned  to 
receive  the  experimental  treatment.  On  the  day  following 
the  treatment,  the  reading  section  of  the  Comprehensive 
Test  of  Basic  Skills  was  administered  to  all  students. 

Again,  although  the  means  of  the  experimental  group 
exceeded  the  means  of  the  control  group  for  all  subtests, 
none  of  the  differences  was  statistically  significant. 

Petty  and  Harrell  (1977)  used  a cognitive  approach  to 
study  the  problems  of  motivation,  anxiety,  and  test- 
wiseness  in  a group  test  situation.  Using  programmed  texts, 
they  utilized  three  groups  of  sixth  grade  children  to 
evaluate  the  separate  effect  of  each  of  the  three  programs, 
one  group  to  receive  all  three  programs,  and  a control 
group.  They  found  a significant  increase  in  IQ  scores  on 
the  Otis-Lennon  Mental  Ability  Test  between  the  control 


group  and  the  combined  group.  No  significant  differences 
were  noted  among  the  three  groups  who  received  the  program 
separately, 

Shuller  (1979)  described  a comprehensive  TW  shills 
instructional  program  called  Mini  Tests,  developed  in  the 
New  York  City  Public  School  System,  and  designed  to  teach 
students  the  type  of  TW  skills  necessary  for  optimal  per- 
formance on  standardized  reading  tests.  Preliminary  assess- 
ment in  the  New  York  City  Public  Schools  showed  that 
schools  using  the  Mini  Test  Kits  made  statistically  sig- 
nificant gains  on  reading  test  scores  over  the  schools 
which  did  not  purchase  the  kits.  After  supplementary 
materials  were  developed,  and  more  schools  had  purchased  the 
kits,  another  study  was  done.  Again,  the  findings  were 
that  the  schools  which  purchased  and  used  the  materials 
made  statistically  significant  gains  on  standardized  read- 
ing test  scores. 

Working  on  the  assumption  that  teachers  at  all  levels 
need  to  be  knowledgeable  about  test-wiseness  strategies  and 
be  able  to  teach  them  to  their  students,  Flippo  and 
Borthwick  (1981)  trained  a group  of  undergraduate  student 
teachers  in  TW  skills.  The  teachers  then  taught  the 
strategies  to  groups  of  elementary  school  students.  Stu- 
dents in  the  experimental  groups  received  TW  strategies 
each  day  for  three  weeks,  while  students  in  the  control 
groups  worked  on  activities  such  as  art  or  library  work. 


Results  showed 


significant  difference  between  experi- 


mental and  control  groups,  although  there  was  a slightly 
higher  observed  score  for  six  of  the  nine  experimental 
groups.  They  concluded  that  the  student  teachers  were  not 
able  to  transfer  the  training  they  received  to  the  students 

Secondary  school.  Moore,  Schutz,  and  Baker  (1966) 
used  a programmed  text  on  guessing  strategy  as  the  instruc- 
tional tool  in  a study  done  with  eighth  graders.  The 
program  was  designed  to  develop  optimal  strategy  for 
dealing  with  speed  and  power  tests  scored  with  or  without  a 
correction  for  guessing.  Analysis  of  the  results  was  based 
on  the  number  of  questions  answered  under  each  of  four  sets 
of  responses,  regardless  of  the  number  of  correct  responses 
The  main  point  of  interest  was  how  well  experimental  sub- 
jects utilized  the  strategies  taught  in  the  program.  The 
conclusion  was  that  self-instructional  techniques  can  be 
used  to  teach  problem-solving  strategies,  specifically 
guessing  strategy. 

In  a study  by  Wahlstrom  and  Boersma  (1968) , 117  ninth 
grade  students  were  randomly  assigned  to  six  groups:  two 

control,  two  experimental,  and  two  placebo.  The  treat- 
ments were,  respectively,  four  25-minute  periods  of  watch- 
ing television,  instruction  in  test-wiseness,  and  discus- 
sion of  occupational  information.  Each  group  received  pre- 
and  post-tests  in  social  studies.  Differences  between  pre- 
and  post-test  mean  scores  for  the  control  and  placebo 


82 

groups  were  not  significant.  One  experimental  group  scored 
significantly  higher  than  the  control  groups.  These  re- 
sults indicated  that  teaching  of  test-wiseness  principles 
can  be  implemented  with  students  and  lead  to  an  increase  in 
achievement  test  scores. 

Slakter  et  al.  (1970b)  designed  programmed  texts  to 
aid  in  the  learning  of  four  TW  behaviors  and  to  train 
examinees  to  answer  each  item,  even  under  a penalty  for 
guessing.  High  school  seniors  in  one  school  were  randomly 
assigned  to  one  of  the  two  instructional  groups,  and  each 
served  as  a control  group  for  the  other.  The  instruments, 
developed  by  the  researchers,  were  used  to  measure  TW  and 
guessing  strategy.  One  instrument  was  administered  the  day 
after  administration  of  the  program,  and  the  other  was 
administered  two  weeks  later.  Analysis  of  the  scores  indi- 
cated that  the  attempt  to  teach  the  subjects  to  respond  to 
all  items  was  successful,  with  the  group  effects  being 
significant  at  the  .05  level.  The  learning  group  effects 
were  also  significant  at  the  .05  level  with  respect  to 
total  TW  scores,  and  more  specifically  with  respect  to 
learning  and  retention  of  stem-option,  similar-option,  and 
specific-determiner  items.  This  study  provides  evidence 
that  TW  skills  can  be  taught  and  retained,  and  further 
suggests  that  it  appears  feasible  to  identify  students  low 
in  TW  skills  and  provide  them  with  a program  to  alleviate 
their  deficiencies  in  this  area. 


83 


Fifty-four  academically  talented  twelfth  grade  stu- 
dents at  an  urban  high  school  in  Philadelphia  were  involved 
in  a study  by  Mcphail  (1978) , who  investigated  whether  a 
group  of  urban  black  and  minority  high  school  students 
could  be  taught  test-taking  skills  for  standardized  reading 
comprehension  tests.  Four  modified  versions  of  the  Iowa 
Silent  Reading  Tests  were  developed  as  criterion  measures. 
The  Test-Wiseness  Curriculum  and  the  Psycholinguistic  Cues 
Curriculum,  developed  by  the  researcher,  provided  the 
instruction  for  the  experimental  groups.  Treatment  effects 
for  the  experimental  groups  were  obtained,  although  they 
were  not  significant  at  the  .05  level  of  confidence.  How- 
ever, several  students  improved  by  as  much  as  140  points  on 
the  SAT-Verbal,  taken  after  the  training. 

Post-secondary . Two  hundred  thirty-seven  college 
students  enrolled  in  an  undergraduate  course  in  psychology 
comprised  the  target  sample  for  a study  by  Flynn  and 
Anderson  (1977) . Pretest  administration  of  a TW  instrument 
allowed  dichotomizing  the  sample  into  test-wise  and  test- 
naive.  Subjects  were  randomly  assigned  to  instructional 
and  non-lnstructional  groups.  Two  months  after  pretesting, 
the  instructional  group  received  a taped  instructional 
program  in  TW,  while  the  non- instructional  group  received  a 
control  tape  recording.  Post-testing,  which  followed 
immediately,  consisted  of  a readministration  of  the  TW 
instrument  and  administration  of  the  Thurstone  Test  of 


Mental  Alertness.  The  final  examination  of  the  course 
showed  that  all  subjects,  whether  test-wise  or  test-naive, 
exhibited  a residual  gain  in  TW  that  was  statistically 
significant.  However,  the  effects  of  instruction  did  not 
generalize  to  the  other  criterion  measures.  Those  subjects 
who  were  initially  found  to  be  test-wise  demonstrated 
superior  performance  on  both  the  intelligence  and  the 
achievement  measures.  The  findings  suggest  that  these 
types  of  instruments  are  sensitive  to  some  type  of  TW  and 
that  TW  does  provide  a source  of  error  in  objective-type 
assessments  of  intelligence. 

In  a study  described  by  Frierson  (1977),  test-taking 
intervention  procedures  were  conducted  for  minority  pre- 
medical students.  The  sample  group  consisted  of  eleven 
students  who  had  previously  taken  the  Medical  College 
Admission  Test.  The  science  subtest  of  this  instrument  was 
used  as  the  criterion  measure.  After  the  instruction  in  TW 
strategies,  these  students  were  able  to  increase  their 
scores  at  an  average  of  90.91  points,  a statistically 
significant  difference.  These  findings  can  have  vast 
implications  for  minority  students  with  respect  to  access 
to  educational  and  career 


opportunities. 


The  Classroom  Guidance  Unit 


An  important  part  of  a developmental  guidance  program 
in  an  elementary  school  is  group  guidance  (Dinkmeyer,  1970; 
Muro,  1970;  Shertzer  & Stone,  1976).  Group  guidance  has 
been  described  by  Muro  and  Freeman  {1968,  p.  44)  as: 

All  aspects  of  the  guidance  program  that  are  con- 
tent centered  and  involve  such  counselor  activi- 
ties as  dispensing  occupational  and  educational 
information,  planning  and  conducting  orientation 
programs,  group  follow-up  meetings,  and  group 
testing. 

The  classroom  guidance  unit  falls  under  this  phase  of 
the  guidance  program.  A guidance  unit  is  a series  of 
planned  activities  focusing  on  some  particular  aspect  of 
the  students'  school  life.  Because  it  is  impossible  to 
provide  individual  or  even  small  group  counseling  to  all 
students  in  the  elementary  school,  the  classroom  guidance 
unit  provides  a means  of  including  all  students  in  the 
school  guidance  program.  In  addition,  unless  significant 
guidance  activities  are  carried  out  in  the  classroom,  it 
is  unlikely  that  the  guidance  program  will  reach  students 
in  the  context  in  which  their  learning  is  centered 
(Dinkmeyer,  1970) . 

Appropriate  topics  for  classroom  guidance  units  might 
focus  on  personal  development  with  emphasis  on  topics  such 
as  making  friends,  developing  good  study  habits,  developing 
positive  attitudes  toward  school,  or  preparing  for  tests; 
vocational  information,  including  learning  about  different 


86 

categories  of  jobs,  about  individual  aptitudes,  and  develop- 
ing good  attitudes  about  work;  and  learning  problem-solving 
procedures.  In  these  activities  there  should  be  an 
emphasis  on  reality  testing,  whereby  pupils  can  validate 
their  own  feelings  by  listening  and  learning  from  others 
(Dinkmeyer,  1970) . Knowing  what  their  classmates  feel  and 
how  they  cope  with  their  problems  helps  students  to  assess 
their  own  feelings  and  coping  procedures. 

A description  of  a sixth  grade  guidance  unit  in  test- 
taking strategies  is  included  in  Chapter  Three.  This 
unit,  in  addition  to  introducing  students  to  several 
test-taking  strategies,  also  provides  opportunities  for 
them  to  interact  with  each  other  in  expressing  feelings 
and  attitudes  about  taking  tests.  In  this  respect  it  is 
different  from  other  programs  which  focus  only  on  the  cog- 
nitive aspect  of  test-taking  while  neglecting  the  affective 
component. 


CHAPTER  III 

METHODS  AND  PROCEDURES 

Many  important  decisions  regarding  children's  educa- 
tional programs  are  based  on  standardized  test  scores. 
Children  need  guidance  and  instruction  in  test-taking 
strategies  in  order  to  minimize  error  variance  in  test 
scores  and  to  help  them  provide  an  accurate  sample  of  their 
achievement.  This  study  attempted  to  assess  the  effective- 
ness of  a series  of  training  sessions  in  test-wiseness  on 
the  standardized  reading  test  scores  of  the  children  receiv- 
ing the  instruction.  In  addition,  an  attempt  was  made  to 
determine  whether  the  degree  of  test  anxiety  reported  by 
the  children  had  any  bearing  on  the  effectiveness  of  train- 
ing and  if  levels  of  test  anxiety  changed  as  a result  of 
receiving  the  instruction. 

The  methodology  for  the  study  is  presented  in  Chapter 
III.  It  includes  a description  of  the  guidance  unit  to  be 
used.  In  addition,  population  and  sampling  procedures  are 
described,  as  well  as  the  hypotheses,  research  design, 
instruments,  procedures,  and  analyses  of  data. 

The  Guidance  Unit 

The  guidance  unit  in  test-taking  strategies  utilized 
in  this  study  consisted  of  six  thirty-  to  forty-five 


minute  classroom  sessions  with  groups  of  sixth  grade  stu- 
dents. After  reviewing  several  programs  in  test-wiseness 
training,  the  investigator  selected  those  strategies  con- 
sidered to  be  helpful  to  elementary  school  students,  taking 
classroom  as  well  as  standardized  multiple  choice  tests 
into  consideration.  These  included  following  directions, 
guessing  strategies,  efficient  use  of  time,  and  looking  for 
clues  in  sentence  stems  and  in  response  options.  Inter- 
spersed with  the  cognitive  materials  were  activities  deal- 
ing with  students'  attitudes  and  feelings  about  taking 
tests.  Students  had  opportunities  for  both  large  and  small 
group  activities  in  each  session. 

This  study  represented  a comprehensive  program  of 
test-taking  skills  which  included  those  strategies  which 
seemed  to  be  appropriate  for  sixth  grade  students.  For 
example,  little  time  was  spent  on  test  format  and  marking 
answer  sheets,  since  sixth  grade  students  usually  have 
mastered  these  skills.  Opportunities  to  recognize  and  dis- 
cuss feelings  about  taking  tests  were  included  in  order  to 
acknowledge  the  affective  component  of  the  testing  situation. 

The  organization  and  format  of  the  sessions  were  de- 
signed to  give  variety  to  the  classroom  sessions  and  to 
allow  the  students  to  interact  with  each  other  in  the  small 
group  activities.  This  format  followed  that  of  the  19B3 
Orange  County  Project  conducted  by  Dr.  R.  D.  Myrick  of  the 
University  of  Florida  in  consultation  with  elementary  school 
counselors  and  teachers  in  Orlando,  Florida. 


89 


In  summary,  the  researcher  prepared  and  utilized  a 
guidance  unit  in  test-taking  strategies  which  consisted  of 
six  thirty-  to  forty-five  minute  classroom  sessions.  It  was 
designed  to  be  used  with  groups  of  sixth  grade  students. 

The  unit  addressed  both  cognitive  and  affective  components 
of  the  testing  situation.  An  outline  of  the  unit  may  be 
seen  in  Appendix  A. 

Population  and  Sample 

Population 

Three  hundred  students  from  two  sixth  grade  centers 
took  part  in  the  study.  The  Beaches  Sixth  Grade  Center 
(n=155)  and  the  Susie  Tolbert  Sixth  Grade  Center  { n=14 5) 
in  Jacksonville,  Florida,  provided  the  student  population. 
Each  center  is  a racially  integrated  school  and  has  about 
a 70%  white  and  30%  minority  population,  with  the  same 
ratio  of  white  to  minority  in  the  instructional  staffs. 

These  schools  serve  a large  area  within  the  city,  which 
means  that  most  of  the  students  are  bussed  and  that  a wide 
range  of  socioeconomic  levels  exists. 

Sample 

There  were  22  sixth  grade  classes  at  the  Jacksonville 
Beach  center  and  26  classes  at  the  Susie  Tolbert  center. 

At  each  of  the  schools  the  investigator  explained  the 

the  research  procedures  to  the  entire 
for  teacher  volunteers  to  participate  in 


guidance  unit  and 
faculty  and  asked 


90 

the  study.  Prom  the  list  of  volunteers,  the  investigator 
used  a table  of  random  numbers  to  randomly  select  six 
classes  in  each  school.  Three  classes  from  each  school 
were  then  randomly  assigned  to  the  experimental  group  (E) 
and  received  the  guidance  unit  on  test-taking  strategies. 
The  other  three  classes  were  assigned  to  the  control  group 
(C)  and  did  not  receive  the  guidance  unit  until  the  study 
was  completed. 

Hypotheses 

The  following  major  null  hypotheses  were  tested. 

HO^:  There  will  be  no  significant  interaction  between 

levels  of  achievement  and  treatment  effect  on  reading  test 
scores,  as  measured  by  the  Comprehensive  Tests  of  Basic 
Skills  as  a result  of  the  guidance  unit. 

HO2:  There  will  be  no  significant  difference  between 

experimental  and  control  groups  of  sixth  grade  students  in 
reading  test  scores,  as  measured  by  the  Comprehensive  Tests 
of  Basic  Skills  as  a result  of  the  guidance  unit. 

HO3:  There  will  be  no  significant  interaction  between 

levels  of  anxiety  as  measured  by  the  Children's  Test 
Anxiety  Scale  and  treatment  effect  on  reading  test  scores 
as  measured  by  the  Comprehensive  Tests  of  Basic  Skills  as 
a result  of  the  guidance  unit. 

HO4:  There  will  be  no  significant  difference  between 

experimental  and  control  groups  of  sixth  grade  students  in 


level  of  test  anxiety  as  measured  by  the  Children's  Test 
Anxiety  Scale  as  a result  of  the  guidance  unit. 

Research  Design 

• The  research  design  used  in  this  study  was  a control 
group  pretest-posttest  design  (Ary,  Jacobs,  & Razavieh, 
1979) . The  experimenter  worked  with  intact  classes.  It 
was  not  possible  to  use  randomization  procedures  with 
individual  students. - Of  those  classes  volunteered  by 
teachers,  six  in  each  school  were  randomly  assigned  to 
experimental  and  control  groups  of  the  study.  * 

• Sources  of  internal  invalidity  for  this  design  in- 
clude the  possible  interaction  effect  between  selection  and 
other  extraneous  variables  that  might  be  mistaken  for 
treatment  effects  and  the  effects  of  statistical  regres- 
sion. . The  random  selection  of  classes  which  participated 
in  the  guidance  unit  helped  to  control  for  these  diffi- 
culties. .Also  the  use  of  analysis  of  covariance  appropri- 
ate for  a hierarchical  design  helped  to  compensate  for 
initial  differences  and  to  control  for  group  effect. * 

Threats  to  external  validity  for  this  design  include 
the  use  of  a pretest,  which  could  sensitize  the  subjects 
to  the  treatment.  The  use  of  intact  classes  helped  con- 
trol for  reactive  effects  of  the  experiment,  since  the  stu- 
dents were  accustomed  to  having  resource  teachers  come  into 
their  classrooms  for  art  and  music  instruction  and  were 
probably  less  aware  of  an 


experiment  being  conducted. 


Instruments 


Two  instruments  were  utilized  in  this  study.  The 
reading  comprehension  subtest  of  the  Comprehensive  Tests 
of  Basic  Skills,  Forms  0 and  V,  was  used  to  assess  the 
effectiveness  of  the  guidance  unit  in  test-taking  strate- 
gies. The  Children's  Test  Anxiety  Scale  measured  the  test 
anxiety  level  of  the  students  in  the  experimental  and  con- 
trol groups.  A description  of  each  of  the  two  instruments 
follows. 

Comprehensive  Tests  of  Basic  Skills  (CTBS) 

The  Comprehensive  Tests  of  Basic  Skills,  Forms  U and 
V,  consist  of  a series  of  norm-ref erenced,  objectives-based 
tests  for  kindergarten  through  twelfth  grade.  The  series 
is  designed  to  measure  achievement  in  the  basic  skills 
commonly  identified  in  state  and  district  curricula.  Since 
the  tests  contain  characteristics  of  both  norm-referenced 
and  criterion-referenced  tests,  they  provide  information 
about  the  relative  ranking  of  students  against  a norm  group 
as  well  as  specific  information  about  the  instructional 
level  of  students. 

The  sampling  procedures  for  CTBS  were  designed  to 
provide  fall  and  spring  norms.  The  combined  fall  and 
spring  norming  sample  consisted  of  approximately  250,000 
students  in  grades  K through  12.  The  basic  sampling  units, 
referred  to  as  districts,  were  public  school  districts. 
Catholic  dioceses  and  archdioceses,  and  other  private 


schools  grouped  by  county.  The  primary  data  source  for  the 
districts  was  data  tapes  prepared  by  the  Curriculum  Informa- 


tion Center,  Inc.  A secondary  source  was  the  California 
Achievement  Tests,  Forms  C and  D,  standardization  sampling 

The  school  districts  were  divided  into  three  sub- 
populations: public,  Catholic,  and  private.  The  private 

subpopulation  contained  some  public  shools,  such  as  some 
state-administered  schools,  Indian  reservation  schools,  and 
schools  associated  with  hospitals. 

The  districts  in  the  public  subpopulation  were  strati- 
fied into  four  geographic  regions:  New  England  and  the 

Mideast,  Great  Lakes  and  Plains,  the  Southeast,  and  the 
Southwest  and  West.  They  were  further  stratified  according 
to  community  type — urban,  suburban,  or  rural — and  into  two 
size  categories — large  and  small — based  on  estimated  fourth 
grade  enrollment.  Districts  within  each  size  category  were 
then  partitioned  into  four  cells  based  on  their  demographic 
index,  or  an  estimate  of  mean  district  performance  on 
standardized  achievement  tests  in  grades  six  through  eight. 
The  Catholic  and  private  districts  were  stratified  into  the 
four  geographic  regions.  In  all  there  were  eighty-six 
cells,  with  an  average  quota  of  eighty- four  students  per 

To  obtain  the  testing  sample,  first  a district  was 
randomly  selected  for  each  cell  from  all  of  the  districts 


cell.  Secondly,  schools 


randomly  selected 


from  all  of  the  schools  in  the  district  until  there  was  a 
sufficient  number  of  students  for  each  grade.  Within  the 
selected  high  schools,  two  classes  were  randomly  selected 
for  Grade  9 and  one  class  each  for  Grades  10,  11,  and  12. 

The  policy  regarding  special  education  students  was  to 
exclude  only  those  students  who  were  not  included  in  any  of 
the  group  achievement  test  programs  for  the  school  district. 
Ethnic  composition  of  the  groups  appeared  to  be  reasonably 
close  to  1978  figures  from  the  National  Center  for  Educa- 
tion Statistics. 

As  stated  in  the  technical  report  of  the  CTBS,  Forms  U 
and  V,  the  paramount  aim  of  the  tests  is  to  provide  valid 
measurement  of  academic  basic  skills  in  reading,  spelling, 
language,  mathematics,  reference  skills,  science,  and 
social  studies.  In  order  to  identify  the  educational  ob- 
jectives to  be  measured,  state  and  district  curriculum 
guides,  textbook  series,  instructional  programs,  and  norm- 
referenced  and  criterion-referenced  assessment  instruments 
were  reviewed.  The  basic  skills  identified  as  common  to 
most  curricula  were  then  compared  to  the  objectives  of 
other  CTB/McGraw-Hill  products,  which  included  the 
California  Achievement  Tests,  Forms  C and  D,  the  PRI 
Reading  Systems,  and  the  Diagnostic  Mathematics  Inventory. 
In  addition,  attention  was  paid  to  content  validity  by 
using  the  Bloom  taxonomy  for  the  cognitive  domain  as  a 


partial  basis  for  classifying  the  objectives.  From  the  com- 
pilation of  educational  objectives,  the  content  of  CTBS  U 
and  V was  selected.  A staff  of  professional  item  writers, 
mostly  experienced  teachers,  researched  and  wrote  items  and 
passages  to  be  tried  out.  Careful  attention  was  given  to 
questions  of  ethnic,  racial,  age,  and  gender  bias  by  having 
all  materials  reviewed  by  professionals  who  represented 
various  ethnic  groups.  These  reviewers  were  asked  their 
input  as  to  appropriateness  of  language,  subject  matter, 
and  representation  of  the  ethnic  groups.  Also,  statistical 
procedures  for  detecting  item  bias  were  carried  out  and 
further  deletions  were  made. 

Further  attention  to  content  validity  was  given  by 
administering  CTBS  0 jointly  with  appropriate  levels  of  the 
Test  of  Cognitive  Skills  in  the  fall  of  1980.  Inter- 
correlation coefficients  for  Grade  6 { n=4 127 ) ranged  from 
.40  to  .97.  In  addition,  intercorrelation  coefficients 
between  CTBS  S and  U for  Grade  6 ranged  from  .54  to  .94. 

Information  on  reliability  of  the  CTBS  is  lacking.  In 
order  to  measure  internal  consistncy,  the  Kuder-Richardson 
formula  was  applied.  Split-half  coefficients  ranged  in  the 
.80's  and  .90 's.  Standard  errors  of  measurement  in  scale 
score  units  were  presented  as  a further  description  of  the 
reliability  of  the  tests.  No  information  was  available  as 
to  test-retest  reliability  or  alternate  forms  reliability. 


Children's  Test  ftnxiety  Scale  (CTAS) 


The  Children's  Test  Anxiety  Scale  is  an  instrument 
designed  by  the  researcher  to  measure  the  level  of  test 
anxiety  experienced  by  elementary  school  students  before 
and  during  a test  experience.  The  CTAS  consists  of  twenty 
items.  It  is  read  to  students  by  the  examiner  while  they 
read  along  silently.  Students  respond  to  each  item,  using 
a Likert-type  scale  of  strongly  agree,  agree,  unsure,  dis- 
agree, and  strongly  disagree,  with  corresponding  values  of 
5,  4,  3,  2,  1 used  for  scoring.  A copy  of  the  CTAS  appears 
in  Appendix  B. 

Prior  to  developing  the  CTAS,  three  test  anxiety 
measures  were  reviewed:  the  Test  Anxiety  Scale  (Sarason, 

1958},  the  Test  Anxiety  Scale  for  Children  (Sarason  et  al., 
I960},  and  the  Test  Anxiety  Inventory  (Spielberger  et  al., 
1978) . None  of  these  was  considered  satisfactory  for  this 
study.  Therefore,  selected  items  from  these  scales  were 
used  along  with  items  developed  from  the  researcher's  obser- 
vation of  elementary  school  students. 

The  Test  Anxiety  Scale  for  Children  (Sarason  et  al., 
1960) , which  Sarason  adapted  from  his  Test  Anxiety  Scale 
(Sarason,  1978),  consists  of  thirty  items  which  are  written 
in  the  form  of  questions,  to  which  children  respond  with  a 
"yes”  or  "no.”  This  measure  was  not  considered  appropriate 
for  the  present  study,  since  eighteen  of  the  items  deal 
with  school  anxiety  in  general  rather  than  with  test  anxiety 


97 


specifically.  Of  the  other  twelve  items  of  the  TASC,  only 
five  focus  directly  on  the  testing  experience.  These  items 
were  restated  in  sentence  form  and  comprise  items  two 
through  six  of  the  CTAS. 

From  the  investigator's  observations  and  experience 
with  testing  school  children,  items  7,  8,  14,  17,  and  18 
were  written.  These  items  were  reviewed  by  the  classroom 
teachers  involved  in  the  study.  They  believed  that  the 
items  reflected  valid  observations  of  children's  test-taking 


The  Spielberger  et  al.  Test  Anxiety  Inventory  (1978) 
was  designed  for  college  students.  It  is  a more  recent 
measure  than  the  other  two,  and  some  items  were  indicative 
of  concerns  of  elementary  school  students.  The  remaining 
items  of  the  CTAS,  which  were  considered  appropriate  for 
sixth  grade  students,  were  selected  and  adapted  from  this 
instrument. 


13,  20)  deal  with  the  cognitive  or  worry  component  of  test 
anxiety.  The  other  ten  (1,  4,  10,  11,  14,  15,  16,  17,  18, 
19)  focus  on  the  behavioral  or  observable  manifestations  of 
test  anxiety.  The  Likert-type  scale  was  chosen  because  it 
was  felt  that  this  type  of  scale  would  yield  a more  precise 
assessment  of  test  anxiety  than  the  true-false  or  yes-no 
responses  on  the  other  instruments.  Because  of  its  similar- 
ity to  other  test  anxiety  instruments,  the  CTAS  can  be  said 
to  have  face  validity. 


The  reliability  of  the  CTAS  was  investigated  prior  to 
its  use  in  this  study.  It  was  administered  to  fifty-six 
students  in  two  sixth  grade  classes  which  were  randomly 
selected  from  a group  of  classrooms  whose  teachers  volun- 
teered to  participate.  These  classes  were  not  involved  in 
the  current  study.  It  was  then  readministered  one  week 
later  to  the  same  population.  A test-retest  correlation 
of  .93  (Pearson  product  moment  correlation)  was  obtained. 

Experimental  Procedures 

This  study  began  in  February  of  1984.  It  encompassed 
five  weeks  and  ended  in  March  of  1984.  After  identifying 
six  classes  in  each  of  two  schools,  classes  were  randomly 
assigned  to  experimental  and  control  groups  and  pretested, 
using  the  reading  comprehension  subtest  of  the  Comprehen- 
sive Tests  of  Basic  Skills,  Form  U,  and  the  Children's  Test 
Anxiety  Scale. 

During  the  first  week  of  the  study  the  investigator 
met  with  the  teachers  of  the  experimental  classes  to  ex- 
plain the  organization  of  the  classroom  and  the  procedures 
to  be  followed.  The  treatment  for  the  experimental  groups, 
conducted  by  the  investigator  with  the  classroom  teachers 
present,  began  with  the  guidance  unit  on  test-taking 
strategies  (Weeks  2-4) . The  unit  consisted  of  six  class- 
room sessions.  The  guidance  unit  is  presented  in  Appendix 
A.  Members  of  the  control  groups  participated  in  review 


teachers 


part  of  their  preparation 


99 

for  the  Stanford  Achievement  Test,  a test  administered  each 
spring  in  the  Duval  County  School  System.  Finally,  experi- 
mental and  control  groups  were  post-tested,  using  the 
reading  comprehension  subtest  of  the  CTBS,  Form  V,  and  the 


Following  collection  of  data,  analyses  of  covariance 
appropriate  for  a hierarchical  design  were  used  to  test  the 
hypotheses.  ANCOVA  is  a method  for  analyzing  differences 
between  experimental  and  control  groups  on  the  dependent 
variable  after  taking  into  account  any  initial  differences 
between  the  groups  on  the  pretest  measure  (Ary,  Jacobs,  & 
Razavieh,  1979) . It  partially  controls  extraneous  vari- 
ables that  confound  the  relationship  between  the  indepen- 
dent variable  and  the  dependent  variable  and  is  especially 
useful  when  the  researcher  must  use  intact  classroom  groups. 


CHAPTER  IV 
RESULTS  OP  THE  STUDY 

This  study  investigated  the  effects  of  a sixth  grade 
classroom  guidance  unit  in  test-taking  strategies  on 
standardized  reading  comprehension  test  scores  and  on  test 
anxiety  levels.  Two  dependent  measures  were  used  to  assess 

Students  in  experimental  and  control  groups  were  pre- 
tested, using  the  reading  comprehension  subtest  of  the 
Comprehensive  Tests  of  Basic  Skills  (CTBS) , Form  U,  and  the 
Children's  Test  Anxiety  Scale  (CTAS) . Students  in  the  six 
classes  randomly  assigned  to  the  experimental  treatment 
then  participated  in  a six-session  guidance  unit  on  test- 
taking strategies,  students  in  the  six  classes  randomly 
assigned  to  the  control  group  received  classroom  review 
sessions  on  content  material  in  reading  and  mathematics  in 
preparation  for  the  spring  administration  of  the  Stanford 
Achievement  Test.  After  the  unit  was  completed,  experimen- 
tal and  control  groups  were  posttested,  using  the  reading 
comprehension  subtest  of  the  CTBS,  Form  V,  and  the  CTAS. 

Posttest  data  from  the  two  measures  used  in  the  study 
were  collected  for  300  students  (E=145i  C=1S5) . An 
analysis  of  covariance  appropriate  for  a hierarchical 


design  was  used  to  test  each  of  the  four  null  hypotheses  at 
the  .05  level.  Pretest  scores  were  used  as  covariates. 

Descriptive  Statistics 

Table  4-1  presents  means  and  standard  deviations  for 
the  six  experimental  and  six  control  classrooms  on  the  post- 
test measures  of  achievement  (CTBS)  and  anxiety  (CTAS) . 

Looking  at  the  posttest  scores  from  the  CTBS,  it  can 
be  seen  that  in  general  the  results  are  in  the  hypothesized 
direction;  that  is,  the  scores  of  the  experimental  groups 
are  slightly  higher  than  those  of  the  control  groups.  How- 
ever, it  is  evident  that  large  variances  exist  between  and 
within  classrooms  in  both  the  experimental  and  control 
groups,  suggesting  that  there  were  variables  other  than  the 
treatment  which  affected  the  test  scores. 

On  the  CTAS  the  results  are  also  in  the  hypothesized 
direction.  The  test  anxiety  scores  of  the  experimental 
groups  are  lower  than  those  of  the  control  groups.  Again, 
variances  between  and  within  classrooms  are  evident  in  both 
the  experimental  and  control  groups. 

Effects  on  Reading  Achievement 

HO,:  There  will  be  no  significant  interaction 

between  pretest  level  of  achievement  and 
treatment  effect  on  reading  posttest 
scores,  as  measured  by  the  Comprehensive 
Tests  of  Basic  Skills,  as  a result  of  the 
guidance  unit. 


102 


Posttest  Means  and  Standard  Deviations 
By  Group  and  By  Class 


35.30  32.7 


X 49.31  52.79  50.77 

S 13.66  14.02  15.03 


54.47  60.07 

16.05  10.92 


17.85  12. 5S 


104 


An  ANCOVA  was  performed  to  test  the  interaction  be- 
tween initial  level  of  anxiety  and  treatment  effect  on  post- 
test reading  scores.  No  significant  interaction  was  found. 
F(l,10)=.13,p>.05.  Therefore,  the  null  hypothesis  relating 
to  the  interaction  between  test  anxiety  level  and  treatment 
effect  was  not  rejected. 

Effects  on  Test  Anxiety 

HO,:  There  will  be  no  significant  difference 

between  experimental  and  control  groups 
of  sixth  grade  students  in  level  of  test 
anxiety,  as  measured  by  the  Children's 
Test  Anxiety  Scale,  as  a result  of  the 
guidance  unit. 

The  children's  Test  Anxiety  Scale  was  administered  to 
students  in  the  six  classrooms  randomly  assigned  to  the 
experimental  group  and  to  the  six  assigned  to  the  control 
group  before  and  after  the  treatment.  The  scores  were 
analyzed  using  analysis  of  covariance  appropriate  for  a 
hierarchical  design.  The  analysis  failed  to  show  a signifi- 
cant reduction  in  reported  test  anxiety  as  a result  of  the 
treatment,  F(l,10)=3.65,p>.05. 

When  there  are  non-significant  differences  among 
classes  within  treatment  groups,  one  approach  to  testing 
the  treatment  effect  is  to  form  an  error  term  by  pooling 
the  class  within  treatment  group  sums  of  squares  with  sub- 
jects within  classes  within  treatment  group  sums  of  squares. 
In  the  current  analysis  there  were  marginally  significant 
differences  among  classes  within  treatment  groups. 


105 


F(10,289)=1.84,p=.0539.  Taking  the  pooling  approach,  the 
treatment  effect  is  significant,  F(l,289)-6.52,p<.05. 

Using  this  approach,  the  null  hypothesis  relating  to  the 
effect  of  the  treatment  on  level  of  test  anxiety  was 
rejected. 

Summary 

The  effect  of  a guidance  unit  on  test-taking  strate- 
gies upon  standardized  reading  test  scores  and  upon  test 
anxiety  scores  was  investigated.  In  addition,  the  inter- 
action between  pretest  level  of  achievement  and  treatment 
effect  on  posttest  reading  scores  was  investigated,  as  well 
as  the  interaction  between  initial  level  of  anxiety  and 
treatment  effect  on  posttest  reading  scores.  Scores  were 
analyzed  using  analysis  of  covariance  appropriate  for  a 
hierarchical  design.  According  to  this  analysis,  there  was 
no  significant  effect  of  the  treatment  either  on  reading 
test  scores  or  on  test  anxiety  scores.  However,  when  an 
error  term  was  formed  by  pooling  the  class  within  treatment 
group  sums  of  squares  with  subjects  within  classes  within 
treatment  group  sums  of  squares,  a significant  treatment 
effect  on  test  anxiety  scores  was  seen.  There  was  no  sig- 
nificant interaction  effect  between  pretest  level  of 
achievement  or  anxiety  and  effect  of  treatment  on  reading 


107 

The  data  furnished  by  the  two  instruments  used  in  the 
study  were  analyzed  using  an  analysis  of  covariance  appro- 
priate for  a hierarchical  design  to  test  for  significant 
differences  in  the  experimental  and  control  groups.  Pre- 
tests served  as  covariates. 

Four  null  hypotheses  were  tested  at  the  .05  level  of 
confidence.  The  first  null  hypothesis  to  be  examined 
focused  on  whether  there  was  an  interaction  between  the 
level  of  reading  achievement  as  determined  by  the  pretest 
and  the  effects  of  the  guidance  unit  on  the  posttest  read- 
ing scores.  The  ANCOVA  procedure  failed  to  show  an  inter- 
action, and  this  hypothesis  was  not  rejected. 

The  second  null  hypothesis  considered  in  the  study  was 
concerned  with  whether  a guidance  unit  on  test-taking 
skills  would  affect  scores  on  a standardized  reading 
achievement  test.  Again,  the  ANCOVA  failed  to  show  a sig- 
nificant difference  between  the  experimental  and  control 
groups.  This  hypothesis  was  not  rejected. 

The  third  null  hypothesis  examined  the  possible  inter- 
action between  level  of  test  anxiety  as  determined  by  the 
pretest  and  effects  of  the  treatment  on  posttest  reading 
scores.  No  interaction  was  shown  by  the  analysis,  and  this 
null  hypothesis  was  not  rejected. 

The  fourth  null  hypothesis  dealt  with  whether  there 
would  be  a significant  reduction  in  levels  of  test  anxiety 
as  a result  of  the  guidance  unit.  The  ANCOVA  failed  to 


show  a significant  reduction  in  test  anxiety.  According  to 
this  analysis,  the  fourth  null  hypothesis  was  not  rejected. 
However,  since  the  F value  for  the  treatment  effects  in  the 
F test  for  differences  among  classes  within  treatments  was 
marginally  significant,  a different  approach  to  testing  the 
treatment  effect  was  used.  An  error  term  was  formed  by 
pooling  the  class  within  group  sums  of  squares  with  sub- 
jects' within  classes  within  group  sums  of  squares.  Taking 
the  pooling  approach,  the  treatment  effect  is  significant, 
and  the  fourth  null  hypothesis  was  rejected. 

Conclusions 

One  conclusion  that  can  be  reached  as  a result  of  this 
study  has  to  do  with  the  fact  that  there  was  no  significant 
difference  in  reading  test  scores  as  a result  of  the 
guidance  unit  and  that  there  was  no  interaction  between 
reading  levels  on  the  pretest  and  effect  of  the  unit  on 
posttest  scores.  Those  students  who  were  high  achievers 
scored  at  the  upper  end  of  the  scale  on  the  pretest,  and  so 
it  was  not  possible  for  them  to  show  gains  of  more  than  a 
few  points  on  the  posttest.  This  was  to  be  expected.  The 
students  most  in  need  of  this  type  of  guidance  are  those 
who  consistently  score  low  on  classroom  and  standardized 
tests.  That  these  students  did  not  improve  their  scores  as 
a result  of  receiving  the  classroom  guidance  unit  may  indi- 
cate that  the  highly  cognitive,  verbal  approach  used  was 
not  effective  for  low  achieving  students. 


It  is  more  convenient  in  terms  of  time  spent  and  ease 
of  scheduling  to  deliver  services  to  intact  classroom 
groups.  However,  the  researcher  observed  throughout  the 
guidance  sessions  that  the  low  achieving  students  had  diffi- 
culty in  paying  attention  to  the  large  group  presentations 
and  in  cooperating  in  the  small  group  activities  within  the 
classroom  group.  They  might  respond  better  to  small  group 
sessions  outside  the  classroom.  They  have  developed  nega- 
tive attitudes  as  a result  of  constant  failure  in  school. 
They  need  attention  in  this  area  before  they  can  have  the 
confidence  they  need  to  work  on  test  scores. 

Even  though  students  did  not  improve  their  scores  as  a 
result  of  reducing  their  test  anxiety,  they  did  indicate 
throughout  the  sessions  that  they  felt  much  more  positive 
about  taking  tests.  Some  said  that  it  helped  to  be  able  to 
talk  about  it  and  to  know  that  others  felt  the  same  way  they 
did.  In  this  respect,  the  classroom  sessions  accomplished 
one  of  the  goals  of  group  guidance;  that  is,  to  provide 
catharsis  and  relieve  tension. 

One  of  the  purposes  of  the  study  was  to  help  clarify 
the  relationship  between  test-wiseness  and  anxiety.  It 
appears  that  the  training  in  test-taking  skills  may  have 
had  an  effect  on  test  anxiety,  but  the  effect  was  of  the 
same  magnitude  regardless  of  students'  initial  anxiety 
level.  At  the  end  of  the  study  students  who  were  more 
highly  anxious  on  the  test  anxiety  pretest  tended  to  be 


110 


lower  achievers  on  the  reading  posttest  than  those  who  were 
initially  low  in  test  anxiety. 

Limitations 

The  following  limitations  of  this  research  study  were 
recognized: 

1.  The  fact  that  the  researcher  conducted  the  guid- 
ance sessions  could  have  produced  an  experimenter  effect. 
However,  the  staffs  of  the  schools  included  resource  art, 
music,  and  physical  education  teachers,  as  well  as  a school 
counselor,  so  the  students  were  accustomed  to  having  indi- 
viduals other  than  the  regular  classroom  teacher  working 
with  them.  Also,  their  regular  teachers  were  present  dur- 
ing the  sessions. 

2.  The  study  was  conducted  in  the  spring  when  em- 
phasis was  already  being  placed  upon  preparing  for  the 
Stanford  Achievement  Test  and  the  Essential  Skills  Tests, 
which  were  administered  in  April  and  May.  All  teachers 
were  spending  time  coaching  the  students  in  content  materi- 
als similar  to  the  formats  to  be  used  in  the  standardized 
tests.  This  could  have  had  an  effect  on  the  results,  since 
the  coaching  time  spent  was  greater  than  the  amount  of  time 
spent  strictly  on  test-taking  skills. 

3.  The  Children's  Test  Anxiety  Scale,  used  to  measure 
pre-  and  posttest  anxiety,  was  not  a standardized  measure. 
However,  it  was  felt  that  efforts  to  insure  validity  and 
reliability  were  sufficient  for  the  purposes  of  this  study. 


Ill 


Recommendations 

The  following  recommendations  are  made  based  on  the 
results  of  this  investigation: 

1.  Many  students  in  the  elementary  school  need  help 
in  learning  how  to  take  tests.  However,  it  appears  that 
instead  of  a highly  cognitive,  verbal  approach,  they  need 
more  intensive  practice  and  supervision  and  more  opportuni- 
ties to  receive  feedback  than  this  unit  provided. 

2.  The  classroom  organization  implemented  in  this 
study  was  more  successful  with  students  who  were  self- 
directed  and  worked  well  together.  These  students  responded 
well  to  the  unit  as  a group.  In  classes  where  many  stu- 
dents needed  individual  attention,  it  was  difficult  to 
conduct  the  sessions.  In  a classroom  unit  such  as  the  one 
used  in  this  study,  the  background  of  the  classes  should 

be  considered  before  attempting  large  group  sessions.  In 
some  classes  advance  preparation  would  be  helpful.  Others 
might  need  a more  structured  approach  with  less  opportunity 
for  interaction  and  more  practice  time.  Still  others  might 
need  to  work  entirely  in  small  groups. 

3.  A larger  study  involving  more  classes  and  incor- 
porating the  above  recommendations  might  yield  different 
results.  This  could  be  accomplished  by  providing  training 
in  test-taking  skills  as  part  of  the  regular  classroom 


curriculum. 


112 


4.  Guidance  units  in  test-taking  skills  geared  to 
younger  children  should  be  implemented  in  the  primary  grades 
before  the  children  have  experienced  so  many  years  of 

5.  Better  results  might  be  obtained  if  the  unit  were 
presented  at  a different  time  in  the  school  year.  For 
example,  the  outcome  might  have  been  different  in  the  fall, 
before  emphasis  on  reviewing  for  tests  began. 


Implications 

Standardized  test  scores  have  a tremendous  impact  on 
major  decisions  made  about  a child's  educational  program. 
The  effect  on  the  child  of  these  decisions  is  difficult  to 
assess . Test  scores  can  influence  others ' opinions  about  a 


child  as  well  as  the  child's  own  way  of  viewing  himself  or 
herself.  Children  have  a right  to  do  as  well  as  they 
possibly  can  on  a standardized  test. 

It  is  difficult  to  effect  improvement  in  the  students 
who  are  most  in  need  of  it,  the  low  achievers  in  the  class- 
room. According  to  this  study,  the  large  group,  short-term, 
verbal  approach  is  ineffective  in  improving  achievement, 
but  may  have  an  effect  (less  clear)  on  test  anxiety.  If 
test-wiseness  principles  could  be  integrated  into  the  curri- 
culum of  the  elementary  school  classroom,  students  would 
have  continuous,  long-term  training  in  test-taking  skills, 
which  might  produce  the  desired  effect  on  both  achievement 
and  test  anxiety. 


desired  effect 


^^ur^^“i^isL"2sc1^s1!r 

gA-sgs. 

a“-  «-"S“imf£;_ 


spisaissaZf 


ss  as  s.” 


begin?"1  (Allow  abou^three^inutes??6  ready' 


119 


121 


5:  sSt'ySAfSi*. 

3.  £elt  it  on^his  leg^ 


S:  SSS'.«> 


125 


* I Mil!  .if,!  ij 
hi  i I; !: «! i! I ii 


fc  I i! 


V 


lijllisfipl 

ii  a I!  h li  ill  ii  g ii  iii 


s d si  d i i i 


127 


130 


1s=?2Jirs.s?  IS3F“ 


131 


gvSsjrs;;. 


-vement 


liiiassis"..., 

-BSSaaa 


"raLSs  sfiur 


““‘SiS.'s.rii;  ss.sr  “B“  l"“ 


anxiety.  Journal  of  Personality,  1953,  21,  336-341. 

'£?Si£2£l£5S* 


Spiegler,  M.  D. , Morris,  L.  W. , & Liebert,  R.  M.  Cognitive 

ss?ii.  ass: 

^ses^S«=sa3SS:  - 

Ha?i?ainc!tS19Vi?nt’  En9lewood  cliffs' 

iaw3S 


BIOGRAPHICAL  SKETCH 


Annie  Louise  Guess  was  born  in  Eastport,  Florida,  to 
Raymond  and  Eula  Guess.  Her  early  years  were  spent  in 
Foley,  Florida,  where  she  attended  elementary  and  junior 
high  school.  She  completed  high  school  at  Taylor  County 
High  School  in  Perry,  Florida. 

After  completing  her  Bachelor  of  Arts  degree  in  modern 
languages  at  Florida  State  University  in  1948  and  becoming 
certified  in  elementary  school  education  shortly  after, 
she  taught  in  the  Duval  County  school  system.  In  1954  she 
returned  to  Florida  State  University  to  complete  the  Master 
of  Arts  degree  in  modern  languages.  She  then  did  post- 
graduate work  in  Spanish  at  the  Universidad  Nacional 
Autonoma  de  Mexico  in  Mexico  City.  After  returning  to  the 
classroom  for  a number  of  years,  she  became  interested  in 
the  field  of  guidance  and  counseling,  and  in  1974  she  com- 
pleted the  Master  of  Arts  degree  in  counseling  at 
Appalachian  State  University  and  became  an  elementary 
school  counselor.  In  1979  she  entered  the  doctoral  program 
in  school  counseling  at  the  University  of  Florida,  she 
plans  to  go  into  private  practice  in  Jacksonville,  Florida. 


139 


,!jb? 


This  dissertation  was  submitted  to  the  Craduate  Faculty  of  the  College 
of  Education  and  Co  the  Graduate  School,  and  was  accepted  as  partial 
fulfillment  of  the  requirements  for  che  degree  of  Doctor  of  Philosophy. 


Education 


Dean  for  Graduate  Studies  and  Research 


